hadoop3.3.1 stand-alone and pseudo-distribution installation

Stand-alone and pseudo-distribution installation of HADOOP3.3.1 under @ubuntu

First of all, the main installation steps in this article refer to the book "Big Data Technology Principles and Applications" by Lin Ziyu. This article aims to record some of the problems the author encountered when configuring Hadoop according to the steps in the book and the problems that occurred after version changes. At the same time, corresponding solutions are given.

Single-machine installation of HADOOP3.3.1 under ubuntu

  1. First, create a new user under your own ubuntu system, that is, create a relatively isolated experimental environment (the default reader's computer in this article is a linux system)

The command to create a user is as follows

$ sudo useradd -m hadoop -s /bin/bash

Then set a password for the new user

$ sudo passwd hadoop

Finally add admin privileges for the new user

$ sudo adduser hadoop sudo

In this way, a new hadoop experimental user under Linux is created. Next, restart the experimental user hadoop.

$ sudo shutdown -r       

linux restart command

PS: The author of the above operation instructions did not encounter any errors during the operation. If any readers encounter errors, please leave a message.

  1. Update apt and install vim editor
    PS: If it is the first time to use Linux system, it is recommended to change the source operation. I won’t go into details here.

Update apt command as follows

$ sudo apt-get update

The command to install the vim editor is as follows

$ sudo apt-get install vim

Confirmation is required during installation, just enter y at the prompt (y/n)

  1. Install SSH and configure SSH passwordless login

The SSH client is installed by default on the Ubuntu system, so you only need to install the SSH server. The command is as follows:

$ sudo apt-get install openssh-server

Also enter y when encountering confirmation

After installation, use the following command to log in to the machine:

$ ssh localhost

When prompted, enter yes and enter the local password to log in. You can find that you need to enter a password every time you log in, so configure passwordless login.

Log out of the previous login

$ exit

Use ssh-keygen to generate a key and add the key to the authorization. The command is as follows:

$ cd ~/.ssh/
$ ssh-keygen -t rsa

PS: Here, just keep pressing the Enter key after entering the second line of instructions. Do not enter additional characters continuously. The
subsequent commands are as follows:

$ cat ./id_rsa.pub >> ./authorized_keys

PS: Here the command is entered in one line, do not branch

At this time, use ssh localhost to log in without a password.

  1. Install JAVA environment
    PS: I won’t go into too much detail here about downloading the JAVA installation package. The default JDK version in this article is 1.8 and the compressed package download directory is in ~/Download

Execute the following command to create the "/usr/lib/jvm" directory to store files:

$ cd /usr/lib
$ sudo mkdir jvm

Execute the following command to decompress the installation package:

$ cd ~
$ cd 下载
$ sudo tar -zxvf ./jdk-8u301-linux-x64.tar.gz -C /usr/lib/jvm

After decompression, set the environment variables:

$ vim ~/.bashrc

Add the following code at the beginning of the file:
PS: After entering vim, press i to enter the editing state. After adding the content, press the Esc key to exit the editing mode, then press:, enter wq! Save and exit vim

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_301
export JRE_HOME=${
    
    JAVA_HOME}/jre
export CLASSPATH=.:${
    
    JAVA_HOME}/lib:${
    
    JRE_HOME}/lib
export PATH=${
    
    JAVA_HOME}/bin:$PATH

Save and exit vim as above and execute the following command to make the file effective:

$ source ~/.bashrc

At this time, use the following command to check whether JAVA is installed and configured successfully:

$ java -version

At this point, the basic configuration process is over. Next, configure Hadoop.

Install stand-alone HADOOP

When running hadoop in stand-alone mode, all nodes are on the same machine, and the storage uses the local file system, and HDFS is not involved.

HADOOP3.3.1 is used for experiments in the following installation steps.

Download the 3.3.1 corresponding installation package from the hadoop official website (https://hadoop.apache.org/releases.html). The location for downloading the installation package is in the ~/Downloads or ~/Downloads directory. The installation package is named hadoop-3.3 .1.tar.gz, then proceed to the installation operation

PS: The following default installation packages are saved in ~/Downloads

$ sudo tar -zxf ~/Downloads/hadoop-3.3.1.tar.gz -C /usr/local

At this point, hadoop has been decompressed to the specified directory. Next, modify the directory name and grant permissions.

$ sudo mv ./hadoop-3.3.1.tar.gz ./hadoop
$ sudo chown -R hadoop ./hadoop

At this point you can use the following command to check the hadoop version number

$ /usr/local/hadoop/bin/hadoop version

If the following information is returned, the installation is successful:

Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar

HADOOP pseudo-distributed installation

In a distributed installation, Hadoop storage uses HDFS, and the name node and data node run on different machines. Pseudo-distributed installation simulates a cluster distribution, but there is only one node in the cluster, and the name node and data node are both on one machine. However, distributed installation can also be achieved on a computer with the help of some technologies, such as virtual machines and docker. In the next article, we will introduce the process of using docker for distributed Hadoop construction.

First, modify the two files (core.site.xml and hdfs.site.xml) in the hadoop installation directory.
PS: The file path involved below is /usr/local/hadoop/etc/hadoop. Use the cd command to access and use ls command to view

The modified core.site.xml content is

<configuration>
	<property>
                 <name>hadoop.tmp.dir</name>
                 <value>file:/usr/local/hadoop/tmp</value>
                 <description>Abase for other temporary directories.</description>
          </property>
          <property>
                 <name>fs.defaultFS</name>
                 <value>hdfs://localhost:9000</value>
          </property>
</configuration>

The modified hdfs.site.xml content is:

<configuration>
	<property>
                    <name>dfs.replication</name>
                    <value>1</value>
               </property>
               <property>
                    <name>dfs.namenode.name.dir</name>
		    <value>file:/usr/local/hadoop/tmp/dfs/name</value>
               </property>
               <property>
                     <name>dfs.datanode.data.dir</name>
                     <value>file:/usr/local/hadoop/tmp/dfs/data</value>
               </property>
               <property>
                     <name>dfs.http.address</name>
                     <value>127.0.0.1:50070</value>
               </property>
</configuration>

The content here is slightly different from that in the book, both are acceptable and are for reference only.

Initialize the node after configuration is complete:

$ cd /usr/local/hadoop
$ ./bin/hdfs namenode -format

After execution, a long reply will appear. The last few lines look like an error. Don’t worry at this time. Look for successfully formatted in the last ten lines of the reply . It can be initialized as long as you follow the normal configuration above. If you can’t find it, Carry out debugging again.

At this point, after the initialization is successful, you can start HDFS with the following command

$ cd /usr/local/hadoop
$ ./sbin/start-dfs.sh

If the following response appears:

Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [peryol-ThinkPad-T540p]

Next, use the following command to view all JAVA processes:

24297 Jps
24156 SecondaryNameNode
23932 DataNode
23789 NameNode

If it is shown that the four processes are running normally as above (it doesn't matter if the numbers are different), the startup is successful, and you can access the web page (http://localhost:50070) in the browser. If you configure it according to the steps in the book, you can access it. The address number (50070) is slightly different.

Replenish

  1. After initialization at the first startup, re-initialization is not required for subsequent startups. If re-initialization is required under certain circumstances, please delete the default file storage directory (/usr/local/hadoop/tmp) in the installation directory in advance. This directory is in Set in the two configuration files modified above.
  2. If a DataStreamerException error occurs when using the put command, first ensure that the firewall of your Linux system is turned off. If an error is still reported when trying again, initialize hadoop. Please refer to the first point for precautions.
  3. If other errors occur, check whether there are any detailed processing errors in the above steps, such as missing a space or spelling errors in instructions.

Finally, if there are mistakes, please criticize and correct them.

Guess you like

Origin blog.csdn.net/weixin_45704680/article/details/120368821