Hadoop Hadoop deployment series

Here Insert Picture Description
There are three ways to deploy Hadoop

  • Standalone Operation (single-node cluster): By default, Hadoop is configured to run in a non-distributed mode as a single Java process. This is useful for debugging.
  • Pseudo-Distributed Operation (pseudo-distributed): pseudo-distributed mode to run on a single node, wherein each Hadoop daemon running on a separate Java process.
  • Distributed deployment Fully-Distributed Operation: true cluster deployment
member version
Hadoop 3.2.1
CentOS 7
Java 1.8
IDEA 2018.3
Gradle 4.8
Springboot 2.1.2 RELEASE

Apache Hadoop 3.2.1 Single node deployment

Java installation

Because Hadoop is a Java-based, so a Java environment can not be missed.
CentOS7 install JDK1.8

Download the installation package

Apache Hadoop official download page

Apache Hadoop 3.2.1 binary Download

Here Insert Picture Description

Unzip to the specified directory on the server

Generally, we would put Hadoop /usr/local/directory

# tar -zxvf hadoop-3.2.1.tar.gz -C /usr/local/hadoop-3.2.1

Here Insert Picture Description

Configuration environment variable

Profiles/etc/profile

export JAVA_HOME=/usr/java/default
export HADOOP_HOME=/usr/local/hadoop-3.2.1/

export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Here Insert Picture Description

HDFS Shell Command List

HDFS Shell Command official documents

Here Insert Picture Description

hadoop fs ... and hdfs dfs ... command acts the same, because they are translated into the same command in the shell.

Test results of Hadoop installation

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

Apache Hadoop 3.2.1 pseudo-distributed deployment

hadoop environment configuration file

Modify the configuration file hadoop-env.sh
the approximate location of the line 54

export JAVA_HOME=/usr/java/default

Profile Settings

Profilesetc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
        <!-- 将localhost换为自己主机的IP,不然此Hadoop将不能被另外的计算机访问
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.84.132:9000</value>
    </property>
        -->
    </property>
</configuration>

Profilesetc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

SSH settings

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

Formatting HDFS

Execution
The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

Format the filesystem:

  $ bin/hdfs namenode -format

Start NameNode daemon and DataNode daemon:

  $ sbin/start-dfs.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

dfs start a long time, we have to be patient.

When it has completed, you can access the web console Hadoophttp://192.168.84.132:9870/dfshealth.html#tab-overview
Here Insert Picture Description
Here Insert Picture Description

Published 48 original articles · won praise 8 · views 10000 +

Guess you like

Origin blog.csdn.net/wangxudongx/article/details/104080751