HDFS configuration and use

The three modes of Hadoop mentioned before: stand-alone mode , pseudo-cluster mode and cluster mode .

Stand-alone mode : Hadoop only exists as a library, which can execute MapReduce tasks on a single computer, and is only used for developers to build a learning and experiment environment.

Pseudo-cluster mode : In this mode, Hadoop will run on a single machine in the form of a daemon process, which is generally used for developers to build a learning and experiment environment.

Cluster mode : This mode is the production environment mode of Hadoop, that is to say, this is the mode that Hadoop really uses to provide production-level services.

HDFS configuration and startup

HDFS is similar to a database and is started as a daemon process. To use HDFS, you need to use HDFS client to connect to HDFS server through network (socket) to realize the use of file system.

In the chapter of Hadoop running environment  , we have configured the basic environment of Hadoop, and the container name is hadoop_single. If you closed the container last time or shut down the computer and the container closed, start and enter the container.

After entering the container, we confirm that Hadoop exists:

hadoop version

Hadoop exists if the result shows the Hadoop version number.

Next we will move to the formal steps.

Create a new hadoop user

Create a new user named hadoop:

adduser hadoop

Install a small tool for modifying user passwords and rights management:

yum install -y passwd sudo

Set hadoop user password:

passwd hadoop

Enter the password for the next two times, be sure to remember it!

Modify the owner of the hadoop installation directory to be the hadoop user:

chown -R hadoop /usr/local/hadoop

Then modify the /etc/sudoers file with a text editor, in

root    ALL=(ALL)       ALL

add a line after

hadoop  ALL=(ALL)       ALL

Then exit the container.

Close and submit container hadoop_single to mirror hadoop_proto:

docker stop hadoop_single
docker commit hadoop_single hadoop_proto

Create new container hdfs_single:

docker run -d --name=hdfs_single --privileged hadoop_proto /usr/sbin/init

This way the new user is created.

Start HDFS

Now enter the newly created container:

docker exec -it hdfs_single su hadoop

Should now be the hadoop user:

whoami

Should show "hadoop"

Generate SSH keys:

ssh-keygen -t rsa

Here you can keep pressing Enter until the generation ends.

Then add the generated key to the trust list:

ssh-copy-id [email protected]

View the container IP address:

ip addr | grep 172

So you know that the IP address of the container is 172.17.0.2, your IP may be different from this.

Before starting HDFS, we make some simple configurations. All Hadoop configuration files are stored in the etc/hadoop subdirectory under the installation directory, so we can enter this directory:

cd $HADOOP_HOME/etc/hadoop

Here we modify two files: core-site.xml and hdfs-site.xml

In core-site.xml, we add the attribute under the tag:

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://<你的IP>:9000</value>
</property>

Add the property under the tag in hdfs-site.xml:

<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

Format file structure:

hdfs namenode -format

Then start HDFS:

start-dfs.sh

The startup is divided into three steps, starting the NameNode, DataNode and Secondary NameNode respectively.

We can run jps to see the Java process:

So far, the HDFS daemon process has been established. Since HDFS itself has an HTTP panel, we can visit http://your container IP:9870/ through a browser to view the HDFS panel and detailed information:

If this page appears, it means that HDFS is configured and started successfully.

Note: If you are not using a Linux system with a desktop environment and no browser, you can skip this step. If you are using Windows but not using Docker Desktop, this step will be difficult for you.

HDFS uses

HDFS Shell

Back to the hdfs_single container, the following commands will be used to operate HDFS:

# Display files and subdirectories under the root directory /, absolute path
hadoop fs -ls /
# Create a new folder, absolute path
hadoop fs -mkdir /hello
# upload files
hadoop fs -put hello.txt /hello/
# download file
hadoop fs -get /hello/hello.txt
# output file content
hadoop fs -cat /hello/hello.txt

The most basic commands of HDFS are described above, and there are many other operations supported by traditional file systems.

HDFS API

HDFS has been supported by many back-end platforms. Currently, the official distribution includes C/C++ and Java programming interfaces. In addition, the package managers of node.js and Python languages ​​also support importing HDFS clients.

Here is a list of dependencies for the package manager:

Maven:

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>3.1.4</version>
    </dependency>

Gradle:

providedCompile group: 'org.apache.hadoop', name: 'hadoop-hdfs-client', version: '3.1.4'

NPM:

npm i webhdfs 

pip:

pip install hdfs

Here is an example of Java connecting to HDFS (don't forget to change the IP address):

example

package com.runoob;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
public class Application {     public static void main(String[] args) {         try {             // Configure connection address             Configuration conf = new Configuration();             conf.set("fs.defaultFS", "hdfs://172.17.0.2:9000");             FileSystem fs = FileSystem.get(conf);             // Open file and read output             Path hello = new Path("/hello/hello.txt");             FSDataInputStream ins = fs.open(hello);             int ch = ins.read();             while (ch != -1) {











                System.out.print((char)ch);
                ch = ins.read();
            }
            System.out.println();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
    }
}

Guess you like

Origin blog.csdn.net/leyang0910/article/details/130534468