Hadoop series--installation and setup

installation

Other URL

Apache Hadoop installation and configuration on a single node-1. Prerequisites-"Apache Hadoop Introduction Tutorial"-书Stack网· BookStack

Description

The following items need to be installed:

JDK or JRE

    OpenJDK or Oracle can be used.

    There are version requirements, see: https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions The
    version requirements are as follows:

  • Apache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)
    Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported:  HADOOP-16795 - Java 11 compile support OPEN
  • Apache Hadoop from 3.0.x to 3.2.x now supports only Java 8
  • Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8

Windows installation

Other URL

Hadoop windows local environment installation_lockie's blog-CSDN blog_local installation hadoop
windows installation and configuration hadoop 3.x
win10 installation and configuration Hadoop

download 

Download hadoop

http://hadoop.apache.org/releases.html

Note: 2.7 and 3.2 are two watersheds. Download here: hadoop-3.2.2.tar.gz

Download winutils

Hadoop cannot run directly on Windows, you need to download winutils.

The official does not provide winutils directly, you need to manually compile it yourself. There are already compiled ones on the third-party GitHub. The following two are OK:

Link 1 (continuous update): https://github.com/cdarlint/winutils
Link 2 (discontinued): https://github.com/steveloughran/winutils  

Download version 3.2.1 of the first link here

Put the downloaded files in the bin directory into the files in the bin directory after decompression in the first step (overwrite if there are duplicates). (The focus is on hadoop.dll and winutils.exe files)

Configuration

Modify environment variables

This step must be done, otherwise the final execution of start-all.cmd will report an error: the system cannot find hadoop

New environment variable => system variable: variable name: HADOOP_HOME, value: decompression path, such as here: D:\dev\bigdata\hadoop-3.1.4
modify environment variable => system variable: variable name: Path, value: Add: %HADOOP_HOME%\bin

Test: After opening cmd, execute the following command, if there is normal output, the installation is successful.

hadoop version

Configure hadoop

The following modifications are all files under this path : decompression path /etc/hadoop/

This configuration method is pseudo-distributed mode.

hadoop-env.cmd

modify

set JAVA_HOME=%JAVA_HOME%

for

set JAVA_HOME=D:\dev\Java\jdk1.8.0_201

note:

At the beginning, my JDK directory is: D:\Program Files\Java\jdk1.8.0_201, because there are spaces, so I need to change the wording, but use PROGRA~1 instead of Program Files or wrap it in quotes. Yes, this can only be done when the JDK is in D:\Program Files\xxx.

Here, I can only change the path of jdk.

core-site.xml

First create a new tmp folder in the hadoop decompression path.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://localhost:9000</value>
</property>
 
<property>
	<name>hadoop.tmp.dir</name>
	<value>/D:/dev/bigdata/hadoop-3.2.2/tmp</value>
</property>
</configuration>

hdfs-site.xml

First create the namenode and datanode folders under the tmp folder in the previous step.

Fill in 1 for a single node, if it is a multi-node, fill in according to the number of nodes.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>       
        <name>dfs.replication</name>       
        <value>1</value>   
    </property>  
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/D:/dev/bigdata/hadoop-3.2.2/tmp/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/D:/dev/bigdata/hadoop-3.2.2/tmp/datanode</value>
    </property>
</configuration>

 mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>

<configuration>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hahoop.mapred.ShuffleHandler</value>
    </property>

</configuration>

Format node

Go to the bin path and execute the following command 

hdfs namenode -format

or

hadoop.cmd namenode -format

If it is normal, it will show that namenode has been successfully formatted. If there is an error, the possible reasons are as follows: the environment variable configuration is wrong, such as a space in the path, or the version of winutils is not correct for the hadoop version.

The success is shown in the figure below:

use

Start hadoop

Enter: Hadoop decompression path/sbin, execute:

start-all.cmd

Then 4 pop-up windows will pop up. Execute jps in CMD to see these 4 processes.

5472  DataNode
14776 ResourceManager
15688 NameNode
14300 Jps
16844 NodeManager

View cluster status

Visit: http://localhost:8088/

View hadoop status

Access: http://localhost:9870     //Description: The access address for 3.1.1 and earlier versions is: http://localhost:50070

Close hadoop

 Enter: Hadoop decompression path/sbin, execute:

stop-all.cmd

Docker installation

Guess you like

Origin blog.csdn.net/feiying0canglang/article/details/113923682