1. Installation Environment
This tutorial uses CentOS 8 64 bit as a system environment, self-install the system.
This tutorial is based native Hadoop 2, in Hadoop 2.8.5 verification from version, may be adapted to release any Hadoop 2.xy, e.g. Hadoop 2.7.1, Hadoop 2.4.1 like.
Once installed the CentOS system before installing Hadoop need to do some necessary work.
2. Create a user hadoop
If when you install CentOS is not used "hadoop" user, you need to add a user named hadoop of
Command line useradd -m hadoop -s / bin / bash # Create a new user hadoop
Create a password for Hadoop:
Can be increased to hadoop user administrator privileges, to facilitate deployment, avoid some of the more difficult for the novice permissions issue, execute:
- visudo
Below, to find root ALL=(ALL) ALL
the line (line 98 should be first clicking on the keyboard ESC
, and then enter :98
(click colon, then input 98, press the Enter key), can skip line 98), then in this additional line of text line below: hadoop ALL=(ALL) ALL
(which is spaced tab), as shown below:
3. Preparations
After users log in using hadoop, also you need to install some software to install Hadoop.
Install SSH, configure SSH without password
Cluster single-node model will need to use SSH login (similar remote login, you can sign in on a Linux host, and run the command above), under normal circumstances, CentOS default installation has a SSH client, SSH server, open a terminal execution tested the following command:
- rpm -qa | grep ssh
If the result is returned as shown below, with containing SSH client is SSH server, you need not be installed.
If you need to install, can be installed by yum (the installation process will let you enter [y / N], to input y):
- sudo yum install openssh-clients
- sudo yum install openssh-server
Then execute the following command to test SSH is available:
- ssh localhost
At this time, there will be the following prompt (SSH debut prompt), enter yes. Then follow the prompts to enter the password hadoop, so that you logged on to this machine.
But this landing is required to enter a password each time, we need to configure SSH without password more convenient.
Enter the first exit
exit just ssh, we returned to our original terminal window, and then use ssh-keygen to generate a key, and the key is added to the authorization:
- Exit # exit just ssh localhost
- cd ~ / .ssh / # Without this directory, please run a ssh localhost
- keygen -t rsa-SSH # will be prompt, press Enter to all
- CAT id_rsa.pub >> authorized_keys # added authorized
- chmod 600 ./authorized_keys # modify file permissions
Install the Java environment
Oracle's Java environment, choose JDK, or OpenJDK, now generally installed by default Linux system is basically OpenJDK, such as CentOS 6.4 on the OpenJDK 1.8 installed by default. Press http://wiki.apache.org/hadoop/HadoopJavaVersions in said, Hadoop running under the OpenJDK 1.8 is no problem. Note that, CentOS 6.4 installed by default only Java JRE, JDK instead, in order to facilitate development, we still need to install the JDK through yum, the installation process will enter [y / N], you can enter y:
- sudo yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
OpenJDK mounting the above command, the default installation location for /usr/lib/jvm/java-1.8.0-openjdk (The path can be performed by rpm -ql java-1.8.0-openjdk-devel | grep '/bin/javac'
determining the command, outputs a path performed after removing the end of the path "/ bin / javac" the rest is the correct path). OpenJDK can be used directly after installation of java, javac and other commands.
Then you need to configure the JAVA_HOME environment variable, for convenience, we set (Further reading: in ~ / .bashrc in setting Linux environment variables and method difference ):
- vim ~/.bashrc
Add the following single row (the mounting position of the JDK) file in the final surface, and save:
- export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
As shown below:
Then we need to let the environment variables into effect, execute the following code:
- Source ~ / .bashrc # the variable is set to take effect
After setting to test us to see if settings are correct:
- echo $ JAVA_HOME # test variable values
- java -version
- JAVA_HOME $ / bin / java -version # and directly execute the same java -version
If correct, $JAVA_HOME/bin/java -version
outputs java version information, and and java -version
the like in the output, as shown below:
In this way, the required Hadoop Java Runtime Environment is now installed.
Install Hadoop 2
Hadoop 2 by http://mirror.bit.edu.cn/apache/hadoop/common/ or http://mirrors.cnnic.cn/apache/hadoop/common/ download, select the tutorial is version 2.8.5 Please download download hadoop-2.xytar.gz this file format, which is compiled, and the other contains the src is Hadoop source code, you need to be compiled before use.
Command line, type the following command to get hadoop2
ps: If you did not install the first use wget to install wget
Extract the downloaded hadoop
- the tar the sudo -zxf ~ / Hadoop-2.8.5.tar.gz -C / catalog # decompression to / usr / local in
- cd / directory
- mv sudo ./hadoop-2.8.5/ ./hadoop # folder name to hadoop
- chown sudo -R hadoop: hadoop ./hadoop # modify file permissions
Hadoop can be used after decompression. Enter the following command to check the availability of Hadoop, Hadoop will successfully display the version information:
- cd / directory / hadoop
- ./bin/hadoop version
Hadoop stand-alone configuration (non-distributed)
The default mode is non-Hadoop distributed mode, it can run without the need for additional configuration. Non-distributed Java process that is a single, easy to debug.
Now we can perform the example of Hadoop to feel run down. Hadoop comes with a wealth of examples (running ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar
you can see all the examples), including wordcount, terasort, join, grep and so on.
In this example we choose grep run, we will all input files as input folder, which screened in line with the regular expression dfs[a-z.]+
of words and count the number of occurrences, the final output to the output folder.
- cd / directory / hadoop
- mkdir ./input
- cp ./etc/hadoop/*.xml ./input # configuration file as input
- ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output 'dfs[a-z.]+'
As shown the results of:
- CAT ./output/* # View run results
Note , Hadoop does not overwrite the default results file, so running again above example displays an error message, you need to first ./output
delete.
- rm -r ./output