Hadoop Hadoop deployment series
There are three ways to deploy Hadoop
- Standalone Operation (single-node cluster): By default, Hadoop is configured to run in a non-distributed mode as a single Java process. This is useful for debugging.
- Pseudo-Distributed Operation (pseudo-distributed): pseudo-distributed mode to run on a single node, wherein each Hadoop daemon running on a separate Java process.
- Distributed deployment Fully-Distributed Operation: true cluster deployment
member | version |
---|---|
Hadoop | 3.2.1 |
CentOS | 7 |
Java | 1.8 |
IDEA | 2018.3 |
Gradle | 4.8 |
Springboot | 2.1.2 RELEASE |
Apache Hadoop 3.2.1 Single node deployment
Java installation
Because Hadoop is a Java-based, so a Java environment can not be missed.
CentOS7 install JDK1.8
Download the installation package
Apache Hadoop official download page
Apache Hadoop 3.2.1 binary Download
Unzip to the specified directory on the server
Generally, we would put Hadoop /usr/local/
directory
# tar -zxvf hadoop-3.2.1.tar.gz -C /usr/local/hadoop-3.2.1
Configuration environment variable
Profiles/etc/profile
export JAVA_HOME=/usr/java/default
export HADOOP_HOME=/usr/local/hadoop-3.2.1/
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
HDFS Shell Command List
HDFS Shell Command official documents
hadoop fs ... and hdfs dfs ... command acts the same, because they are translated into the same command in the shell.
Test results of Hadoop installation
$ mkdir input
$ cp etc/hadoop/*.xml input
$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Apache Hadoop 3.2.1 pseudo-distributed deployment
hadoop environment configuration file
Modify the configuration file hadoop-env.sh
the approximate location of the line 54
export JAVA_HOME=/usr/java/default
Profile Settings
Profilesetc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<!-- 将localhost换为自己主机的IP,不然此Hadoop将不能被另外的计算机访问
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.84.132:9000</value>
</property>
-->
</property>
</configuration>
Profilesetc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
SSH settings
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Formatting HDFS
Execution
The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.
Format the filesystem:
$ bin/hdfs namenode -format
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
dfs start a long time, we have to be patient.
When it has completed, you can access the web console Hadoophttp://192.168.84.132:9870/dfshealth.html#tab-overview