centOS 8 install Hadoop

1. Installation Environment

 

This tutorial uses  CentOS 8 64 bit  as a system environment, self-install the system. 

This tutorial is based native Hadoop 2, in  Hadoop 2.8.5  verification from version, may be adapted to release any Hadoop 2.xy, e.g. Hadoop 2.7.1, Hadoop 2.4.1 like.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Once installed the CentOS system before installing Hadoop need to do some necessary work.

2. Create a user hadoop

If when you install CentOS is not used "hadoop" user, you need to add a user named hadoop of

           Command line useradd  -m hadoop -s / bin / bash                # Create a new user hadoop

                

 

Create a password for Hadoop:

 

 

 

Can be increased to hadoop user administrator privileges, to facilitate deployment, avoid some of the more difficult for the novice permissions issue, execute:

  1. visudo
Shell Commands

Below, to find  root ALL=(ALL) ALL the line (line 98 should be first clicking on the keyboard  ESC , and then enter  :98 (click colon, then input 98, press the Enter key), can skip line 98), then in this additional line of text line below: hadoop ALL=(ALL) ALL (which is spaced tab), as shown below:

   

 

3. Preparations

After users log in using hadoop, also you need to install some software to install Hadoop.

Install SSH, configure SSH without password

Cluster single-node model will need to use SSH login (similar remote login, you can sign in on a Linux host, and run the command above), under normal circumstances, CentOS default installation has a SSH client, SSH server, open a terminal execution tested the following command:

  1. rpm -qa | grep ssh
Shell Commands

If the result is returned as shown below, with containing SSH client is SSH server, you need not be installed.

 

 

 

If you need to install, can be installed by yum (the installation process will let you enter [y / N], to input y):

  1. sudo yum install openssh-clients
  2. sudo yum install openssh-server
Shell Commands

Then execute the following command to test SSH is available:

  1. ssh localhost
Shell Commands

At this time, there will be the following prompt (SSH debut prompt), enter yes. Then follow the prompts to enter the password hadoop, so that you logged on to this machine.

 

But this landing is required to enter a password each time, we need to configure SSH without password more convenient.

Enter the first  exit exit just ssh, we returned to our original terminal window, and then use ssh-keygen to generate a key, and the key is added to the authorization:

  1. Exit  # exit just ssh localhost
  2. cd  ~ / .ssh /  # Without this directory, please run a ssh localhost
  3. keygen -t rsa-SSH  # will be prompt, press Enter to all
  4. CAT  id_rsa.pub >> authorized_keys  # added authorized
  5. chmod  600 ./authorized_keys  # modify file permissions

Install the Java environment

Oracle's Java environment, choose JDK, or OpenJDK, now generally installed by default Linux system is basically OpenJDK, such as CentOS 6.4 on the OpenJDK 1.8 installed by default. Press  http://wiki.apache.org/hadoop/HadoopJavaVersions  in said, Hadoop running under the OpenJDK 1.8 is no problem. Note that, CentOS 6.4 installed by default only Java JRE, JDK instead, in order to facilitate development, we still need to install the JDK through yum, the installation process will enter [y / N], you can enter y:

  1. sudo yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
Shell Commands

OpenJDK mounting the above command, the default installation location for /usr/lib/jvm/java-1.8.0-openjdk (The path can be performed by  rpm -ql java-1.8.0-openjdk-devel | grep '/bin/javac' determining the command, outputs a path performed after removing the end of the path "/ bin / javac" the rest is the correct path). OpenJDK can be used directly after installation of java, javac and other commands.

Then you need to configure the JAVA_HOME environment variable, for convenience, we set (Further reading: in ~ / .bashrc in  setting Linux environment variables and method difference ):

  1. vim ~/.bashrc
Shell Commands

Add the following single row (the mounting position of the JDK) file in the final surface, and save:

  1. export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
Shell

As shown below:

 

 

Then we need to let the environment variables into effect, execute the following code:

  1. Source  ~ / .bashrc  # the variable is set to take effect
Shell Commands

After setting to test us to see if settings are correct:

  1. echo  $ JAVA_HOME  # test variable values
  2. java -version
  3. JAVA_HOME $ / bin / java -version  # and directly execute the same java -version
Shell Commands

If correct, $JAVA_HOME/bin/java -version outputs java version information, and and  java -version the like in the output, as shown below:

 

 

 

 

In this way, the required Hadoop Java Runtime Environment is now installed.

 

Install Hadoop 2

Hadoop 2 by  http://mirror.bit.edu.cn/apache/hadoop/common/  or  http://mirrors.cnnic.cn/apache/hadoop/common/  download, select the tutorial is version 2.8.5 Please download download  hadoop-2.xytar.gz this file format, which is compiled, and the other contains the src is Hadoop source code, you need to be compiled before use.

Command line, type the following command to get hadoop2

 

 

ps: If you did not install the first use wget to install wget

 

 

 

 

Extract the downloaded hadoop

  1. the tar the sudo  -zxf ~ / Hadoop-2.8.5.tar.gz -C / catalog  # decompression to / usr / local in
  2. cd  / directory
  3. mv sudo  ./hadoop-2.8.5/ ./hadoop  # folder name to hadoop
  4. chown sudo  -R hadoop: hadoop ./hadoop  # modify file permissions

 

Hadoop can be used after decompression. Enter the following command to check the availability of Hadoop, Hadoop will successfully display the version information:

  1. cd  / directory / hadoop
  2. ./bin/hadoop version

 

Hadoop stand-alone configuration (non-distributed)

The default mode is non-Hadoop distributed mode, it can run without the need for additional configuration. Non-distributed Java process that is a single, easy to debug.

Now we can perform the example of Hadoop to feel run down. Hadoop comes with a wealth of examples (running  ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar you can see all the examples), including wordcount, terasort, join, grep and so on.

 

 

In this example we choose grep run, we will all input files as input folder, which screened in line with the regular expression  dfs[a-z.]+ of words and count the number of occurrences, the final output to the output folder.

  1. cd  / directory / hadoop
  2. mkdir ./input
  3. cp  ./etc/hadoop/*.xml ./input  # configuration file as input
  4. ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output 'dfs[a-z.]+'

As shown the results of:

 

 

  1. CAT  ./output/*  # View run results

 

 

Note , Hadoop does not overwrite the default results file, so running again above example displays an error message, you need to first  ./output delete.

  1. rm -r ./output

Guess you like

Origin www.cnblogs.com/hanhaotian/p/11754393.html