installation
Other URL
Description
The following items need to be installed:
JDK or JRE
OpenJDK or Oracle can be used.
There are version requirements, see: https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions The
version requirements are as follows:
- Apache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)
Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported: HADOOP-16795 - Java 11 compile support OPEN- Apache Hadoop from 3.0.x to 3.2.x now supports only Java 8
- Apache Hadoop from 2.7.x to 2.10.x support both Java 7 and 8
Windows installation
Other URL
Hadoop windows local environment installation_lockie's blog-CSDN blog_local installation hadoop
windows installation and configuration hadoop 3.x
win10 installation and configuration Hadoop
download
Download hadoop
http://hadoop.apache.org/releases.html
Note: 2.7 and 3.2 are two watersheds. Download here: hadoop-3.2.2.tar.gz
Download winutils
Hadoop cannot run directly on Windows, you need to download winutils.
The official does not provide winutils directly, you need to manually compile it yourself. There are already compiled ones on the third-party GitHub. The following two are OK:
Link 1 (continuous update): https://github.com/cdarlint/winutils
Link 2 (discontinued): https://github.com/steveloughran/winutilsDownload version 3.2.1 of the first link here
Put the downloaded files in the bin directory into the files in the bin directory after decompression in the first step (overwrite if there are duplicates). (The focus is on hadoop.dll and winutils.exe files)
Configuration
Modify environment variables
This step must be done, otherwise the final execution of start-all.cmd will report an error: the system cannot find hadoop
New environment variable => system variable: variable name: HADOOP_HOME, value: decompression path, such as here: D:\dev\bigdata\hadoop-3.1.4
modify environment variable => system variable: variable name: Path, value: Add: %HADOOP_HOME%\bin
Test: After opening cmd, execute the following command, if there is normal output, the installation is successful.
hadoop version
Configure hadoop
The following modifications are all files under this path : decompression path /etc/hadoop/
This configuration method is pseudo-distributed mode.
hadoop-env.cmd
modify
set JAVA_HOME=%JAVA_HOME%
for
set JAVA_HOME=D:\dev\Java\jdk1.8.0_201
note:
At the beginning, my JDK directory is: D:\Program Files\Java\jdk1.8.0_201, because there are spaces, so I need to change the wording, but use PROGRA~1 instead of Program Files or wrap it in quotes. Yes, this can only be done when the JDK is in D:\Program Files\xxx.
Here, I can only change the path of jdk.
core-site.xml
First create a new tmp folder in the hadoop decompression path.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/D:/dev/bigdata/hadoop-3.2.2/tmp</value> </property> </configuration>
hdfs-site.xml
First create the namenode and datanode folders under the tmp folder in the previous step.
Fill in 1 for a single node, if it is a multi-node, fill in according to the number of nodes.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/D:/dev/bigdata/hadoop-3.2.2/tmp/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/D:/dev/bigdata/hadoop-3.2.2/tmp/datanode</value> </property> </configuration>
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
yarn-site.xml
<?xml version="1.0"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hahoop.mapred.ShuffleHandler</value> </property> </configuration>
Format node
Go to the bin path and execute the following command
hdfs namenode -format
or
hadoop.cmd namenode -format
If it is normal, it will show that namenode has been successfully formatted. If there is an error, the possible reasons are as follows: the environment variable configuration is wrong, such as a space in the path, or the version of winutils is not correct for the hadoop version.
The success is shown in the figure below:
use
Start hadoop
Enter: Hadoop decompression path/sbin, execute:
start-all.cmd
Then 4 pop-up windows will pop up. Execute jps in CMD to see these 4 processes.
5472 DataNode 14776 ResourceManager 15688 NameNode 14300 Jps 16844 NodeManager
View cluster status
Visit: http://localhost:8088/
View hadoop status
Access: http://localhost:9870 //Description: The access address for 3.1.1 and earlier versions is: http://localhost:50070
Close hadoop
Enter: Hadoop decompression path/sbin, execute:
stop-all.cmd