Big data technology - Hadoop3.X entry construction + installation and tuning (2.HDFS)

1. Overview of HDFS

1.1 HDFS output background and definition

1) The background of HDFS
With the increasing amount of data, all the data cannot be stored in one operating system, so it is allocated to more disks managed by the operating system, but it is inconvenient to manage and maintain, and there is an urgent need for a system To manage files on multiple machines, this is a distributed file management system. HDFS is just one type of distributed file management system.
2) HDFS defines
HDFS (Hadoop Distributed File System), which is a file system for storing files and locating files through directory trees; secondly, it is distributed, and many servers are combined to realize its functions. Servers have their own roles.
The usage scenario of HDFS: it is suitable for the scenario of writing once and reading many times. A file does not need to be changed after it is created, written to, and closed.

1.2 Advantages and disadvantages of HDFS

Advantages of HDFS
1) High fault tolerance
➢ Multiple copies of data are automatically saved. It improves fault tolerance by adding copies.
insert image description here

➢ After a copy is lost, it can be restored automatically.
insert image description here

2) Suitable for processing big data
➢ Data scale: It can handle data with a data scale reaching GB, TB, or even PB level;
➢ File scale: It can handle the number of files above one million, which is quite large.
3) It can be built on cheap machines, and the reliability can be improved through the multi-copy mechanism.
Disadvantages of HDFS
1) It is not suitable for low-latency data access, such as storing data in milliseconds, which is impossible.
2) It cannot efficiently store a large number of small files.
➢ If storing a large number of small files, it will occupy a large amount of NameNode memory to store file directory and
block information. This is not advisable, because the memory of NameNode is always limited;
➢ The addressing time of small file storage will exceed the reading time, which violates the design goal of HDFS.
3) Concurrent writing and random modification of files are not supported.
➢ One file can only be written by one, and multiple threads are not allowed to write at the same time;
➢ Only data append (append) is supported, and random modification of files is not supported.
insert image description here

1.3 HDFS structure

1) NameNode (nn): It is the Master, which is a supervisor and manager.
(1) Manage HDFS namespace;
(2) Configure copy strategy;
(3) Manage data block (Block) mapping information;
(4) Process client read and write requests.
2) DataNode: It is Slave. The NameNode issues commands, and the DataNode performs the actual operations.
(1) Store the actual data block;
(2) Perform the read/write operation of the data block.
insert image description here
3) Client: It is the client.
(1) File segmentation. When a file is uploaded to HDFS, the Client divides the file into blocks and then uploads it;
(2) Interacts with the NameNode to obtain the location information of the file;
(3) Interacts with the DataNode to read or write data;
(4) Client provides some commands to manage HDFS, such as NameNode formatting;
(5) Client can access HDFS through some commands, such as adding, deleting, checking and modifying HDFS;
4) Secondary NameNode: not the hot backup of NameNode. When the NameNode hangs up, it cannot immediately replace the NameNode and provide services.
(1) Assist NameNode to share its workload, such as regularly merging Fsimage and Edits and pushing them to NameNode;
(2) In case of emergency, it can assist in recovering NameNode.

1.4 HDFS file block size (interview focus)

The files in HDFS are physically stored in blocks (Block). The size of the block can be specified by the configuration parameter (dfs.blocksize). The default size is 128M in Hadoop2.x/3.x version and 1.x version It is 64M.
insert image description here
Thinking: Why can't the block size be set too small or too large?
(1) The HDFS block setting is too small, which will increase the seek time, and the program is always looking for the start position of the block;
(2) If the block setting is too large, the time to transfer data from the disk will be significantly longer than the time required to locate the start position of the block required time. As a result, the program will be very slow when processing this piece of data.
Summary: The HDFS block size setting mainly depends on the disk transfer rate.

2. Shell operation of HDFS (development focus)

2.1 Basic syntax

hadoop fs 具体命令 OR hdfs dfs 具体命令
两个是完全相同的。

2.2 Command Encyclopedia

[xusheng@hadoop102 hadoop-3.1.3]$ bin/hadoop fs

[-appendToFile <localsrc> ... <dst>]
		[-cat [-ignoreCrc] <src> ...]
		[-chgrp [-R] GROUP PATH...]
		[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
		[-chown [-R] [OWNER][:[GROUP]] PATH...]
		[-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
		[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
		[-count [-q] <path> ...]
		[-cp [-f] [-p] <src> ... <dst>]
		[-df [-h] [<path> ...]]
		[-du [-s] [-h] <path> ...]
		[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
		[-getmerge [-nl] <src> <localdst>]
		[-help [cmd ...]]
		[-ls [-d] [-h] [-R] [<path> ...]]
		[-mkdir [-p] <path> ...]
		[-moveFromLocal <localsrc> ... <dst>]
		[-moveToLocal <src> <localdst>]
		[-mv <src> ... <dst>]
		[-put [-f] [-p] <localsrc> ... <dst>]
		[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
		[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
<acl_spec> <path>]]
		[-setrep [-R] [-w] <rep> <path> ...]
		[-stat [format] <path> ...]
		[-tail [-f] <file>]
		[-test -[defsz] <path>]
		[-text [-ignoreCrc] <src> ...]

2.3 Practical operation of common commands

2.3.1 Preparations

1) Start the Hadoop cluster (to facilitate subsequent testing)

[xusheng@hadoop102 hadoop-3.1.3]$ sbin/start-dfs.sh
[xusheng@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh

2) -help: output this command parameter

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -help rm

3) Create /sanguo folder

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -mkdir /sanguo

insert image description here

2.3.2 Upload

1) -moveFromLocal: Cut and paste from local to HDFS

[xusheng@hadoop102 hadoop-3.1.3]$ vim shuguo.txt
输入:
shuguo
[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -moveFromLocal ./shuguo.txt /sanguo

insert image description here

2) -copyFromLocal: Copy files from the local file system to the HDFS path

[xusheng@hadoop102 hadoop-3.1.3]$ vim weiguo.txt
输入:
weiguo
[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -copyFromLocal weiguo.txt /sanguo

insert image description here

3) -put: equivalent to copyFromLocal, the production environment is more accustomed to using put

[xusheng@hadoop102 hadoop-3.1.3]$ vim wuguo.txt
输入:
wuguo
[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -put ./wuguo.txt /sanguo

4) -appendToFile: Append a file to the end of an existing file

[xusheng@hadoop102 hadoop-3.1.3]$ vim liubei.txt
输入:
liubei
[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -appendToFile liubei.txt /sanguo/shuguo.txt

insert image description here
insert image description here

2.3.3 download

1) -copyToLocal: copy from HDFS to local

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -copyToLocal /sanguo/shuguo.txt ./

2) -get: equivalent to copyToLocal, the production environment is more accustomed to using get

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -get /sanguo/shuguo.txt ./shuguo2.txt

insert image description here

2.3.4 HDFS Direct Operation

1) -ls: Display directory information

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -ls /sanguo

2) -cat: display file content

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -cat /sanguo/shuguo.txt

insert image description here

3) -chgrp, -chmod, -chown: the usage in the Linux file system is the same, modify the permissions of the file

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -chmod 666 /sanguo/shuguo.txt
[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -chown xusheng:xusheng /sanguo/shuguo.txt

insert image description here

4) -mkdir: create path

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -mkdir /jinguo

5) -cp: copy from one path of HDFS to another path of HDFS

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -cp /sanguo/shuguo.txt /jinguo

6) -mv: Move files in HDFS directory

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -mv /sanguo/wuguo.txt /jinguo
[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -mv /sanguo/weiguo.txt /jinguo

7) -tail: display the end 1kb data of a file

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -tail /jinguo/shuguo.txt

insert image description here

8) -rm: delete files or folders

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -rm /sanguo/shuguo.txt

9) -rm -r: Recursively delete the directory and the contents of the directory

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -rm -r /sanguo

10) -du statistics folder size information

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -du -s -h /jinguo
27 81 /jinguo
[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -du -h /jinguo
14 42 /jinguo/shuguo.txt
7 21 /jinguo/weiguo.txt
6 18 /jinguo/wuguo.tx

11) -setrep: Set the number of copies of files in HDFS

[xusheng@hadoop102 hadoop-3.1.3]$ hadoop fs -setrep 10 /jinguo/shuguo.txt

insert image description here
insert image description here

The number of replicas set here is only recorded in the metadata of the NameNode. Whether there are really so many replicas depends on the number of DataNodes. Because there are only 3 devices at present, there are at most 3 copies. Only when the number of nodes increases to 10, the number of copies can reach 10.

3. API operation of HDFS

3.1 Client environment preparation

1) Find the Windows dependency folder under the data package path, and copy hadoop-3.1.0 to a non-Chinese path (such as d:\).

链接:https://pan.baidu.com/s/1PqNnlBNCTW1h7HuyxT4hYw 
提取码:9oz7 

2) Configure the HADOOP_HOME environment variable
insert image description here
3) Configure the Path environment variable.
Note: If the environment variables do not work, you can restart the computer
insert image description here
to verify that the Hadoop environment variables are normal. Double-click winutils.exe, if the following error is reported. It means that the Microsoft runtime library is missing (genuine systems often have this problem). In the data package, there is a corresponding Microsoft runtime installation package, double-click to install it.
insert image description here
4) Create a Maven project HdfsClientDemo in IDEA, and import the corresponding dependency coordinates + log addition

<dependencies>
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-client</artifactId>
		<version>3.1.3</version>
	</dependency>
	<dependency>
		<groupId>junit</groupId>
		<artifactId>junit</artifactId>
		<version>4.12</version>
	</dependency>
	<dependency>
		<groupId>org.slf4j</groupId>
		<artifactId>slf4j-log4j12</artifactId>
		<version>1.7.30</version>
	</dependency>
</dependencies>

In the src/main/resources directory of the project, create a new file named "log4j.properties", fill in the file

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

5) Create package name: : com.xusheng.hdfs
6) Create HdfsClient class

3.2 HDFS API case practice

3.2.1 HDFS file upload (test parameter priority)

1) Write source code

@Test
public void testCopyFromLocalFile() throws IOException,InterruptedException, URISyntaxException {
    
    
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	configuration.set("dfs.replication", "2");
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),
	configuration, "xusheng");
	// 2 上传文件
	fs.copyFromLocalFile(new Path("d:/sunwukong.txt"), new
	Path("/xiyou/huaguoshan"));
	// 3 关闭资源
	fs.close();

2) Copy hdfs-site.xml to the resources directory of the project

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
</configuration>

3) Parameter priority
Parameter priority sorting: (1) the value set in the client code > (2) the user-defined configuration file under ClassPath > (3) then the server's custom configuration (xxx-site.xml) >(4) The default configuration of the server (xxx-default.xml)
insert image description here

3.2.2 HDFS file download

@Test
public void testCopyToLocalFile() throws IOException,InterruptedException, URISyntaxException{
    
    
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),
	configuration, "xusheng");
	// 2 执行下载操作
	// boolean delSrc 指是否将原文件删除
	// Path src 指要下载的文件路径
	// Path dst 指将文件下载到的路径
	// boolean useRawLocalFileSystem 是否开启文件校验
	fs.copyToLocalFile(false, new
	Path("/xiyou/huaguoshan/sunwukong.txt"), new Path("d:/sunwukong2.txt"),true);
	// 3 关闭资源
	fs.close();
}

**Note:** If the above code is executed and the file cannot be downloaded, it may be that your computer does not have enough runtime libraries supported by Microsoft, and you need to install the Microsoft runtime libraries.

3.2.3 HDFS file rename and move

@Test
public void testRename() throws IOException, InterruptedException,URISyntaxException{
    
    
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),
	configuration, "xusheng");
	// 2 修改文件名称
	fs.rename(new Path("/xiyou/huaguoshan/sunwukong.txt"), newPath("/xiyou/huaguoshan/meihouwang.txt"));
	// 3 关闭资源
	fs.close();
}

3.2.4 HDFS delete files and directories

@Test
public void testDelete() throws IOException, InterruptedException,URISyntaxException{
    
    
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),
	configuration, "xusheng");
	// 2 执行删除
	fs.delete(new Path("/xiyou"), true);
	// 3 关闭资源
	fs.close();
}

3.2.5 View HDFS file details

View file name, permissions, length, block information

@Test
public void testListFiles() throws IOException, InterruptedException,URISyntaxException {
    
    
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),
	configuration, "xusheng");
	// 2 获取文件详情
	RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"),true);
	while (listFiles.hasNext()) {
    
    
		LocatedFileStatus fileStatus = listFiles.next();
		System.out.println("========" + fileStatus.getPath() + "=========");
		System.out.println(fileStatus.getPermission());
		System.out.println(fileStatus.getOwner());
		System.out.println(fileStatus.getGroup());
		System.out.println(fileStatus.getLen());
		System.out.println(fileStatus.getModificationTime());
		System.out.println(fileStatus.getReplication());
		System.out.println(fileStatus.getBlockSize());
		System.out.println(fileStatus.getPath().getName());
		// 获取块信息
		BlockLocation[] blockLocations = fileStatus.getBlockLocations();
		System.out.println(Arrays.toString(blockLocations));
	}
	// 3 关闭资源
	fs.close();
}

insert image description here
insert image description here

3.2.6 HDFS file and folder judgment

@Test
public void testListStatus() throws IOException, InterruptedException,URISyntaxException{
    
    
	// 1 获取文件配置信息
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),
	configuration, "xusheng");
	// 2 判断是文件还是文件夹
	FileStatus[] listStatus = fs.listStatus(new Path("/"));
	for (FileStatus fileStatus : listStatus) {
    
    
	// 如果是文件
		if (fileStatus.isFile()) {
    
    
		System.out.println("f:"+fileStatus.getPath().getName());
		}else {
    
    
		System.out.println("d:"+fileStatus.getPath().getName());
		}
	}
	// 3 关闭资源
	fs.close();
}

insert image description here

3.2.7 Encapsulated code

package com.xusheng.hdfs;

import org.junit.Test;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.yarn.webapp.hamlet2.Hamlet;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.nio.ByteBuffer;
import java.util.Arrays;


/**
 * 客户端代码常用套路
 * 1.获取一个客户端对象
 * 2.执行相关的操作命令
 * 3.关闭资源
 * HDFS zookeeper
 */
public class HdfsClient {
    
    

    private FileSystem fs;
    @Before
    public void init() throws IOException, InterruptedException, URISyntaxException {
    
    
        // 连接的集群nn地址
        URI uri = new URI("hdfs://hadoop102:8020");
        // 创建一个配置文件
        Configuration configuration = new Configuration();

        configuration.set("dfs.replication", "2");
        // 用户
        String user = "xusheng";
        //1.获取一个客户端对象
        fs = FileSystem.get(uri,configuration,user);
    }

    @After
    public void close() throws IOException {
    
    
        //3.关闭资源
        fs.close();
    }

    @Test
    public void testmkdir() throws URISyntaxException, IOException, InterruptedException {
    
    

        //2 创建一个文件夹
        fs.mkdirs(new Path("/xiyou/huaguoshan"));

    }
    // 上传
    /**
     * 参数优先级
     * hdfs-default.xml => hdfs-site.xml=> 在项目资源目录下的配置文件 =》代码里面的配置
     *
     * @throws IOException
     */
    @Test
    public void testPut() throws IOException {
    
    
        // 参数解读:参数一:表示删除原数据;
        // 参数二:是否允许覆盖;
        // 参数三:原数据路径; 参数四:目的地路径
        fs.copyFromLocalFile(false,true,new Path("D:\\sunwukong.txt"),new Path("hdfs://hadoop102/xiyou/huaguoshan"));
    }
    @Test
    public void testPut2() throws IOException {
    
    
        FSDataOutputStream fos = fs.create(new Path("/input"));

        fos.write("hello world".getBytes());
    }

    // 文件下载
    @Test
    public void testGet() throws IOException {
    
    
        // 参数的解读:参数一:原文件是否删除;参数二:原文件路径HDFS; 参数三:目标地址路径Win ; 参数四:
        fs.copyToLocalFile(true, new Path("hdfs://hadoop102/xiyou/huaguoshan/"), new Path("D:\\"), true);
        //fs.copyToLocalFile(false, new Path("hdfs://hadoop102/a.txt"), new Path("D:\\"), false);
    }
    // 删除
    @Test
    public void  testRm() throws IOException {
    
    
        // 参数解读:参数1:要删除的路径; 参数2 : 是否递归删除
        // 删除文件
        //fs.delete(new Path("/jdk-8u212-linux-x64.tar.gz"),false);

        // 删除空目录
        //fs.delete(new Path("/xiyou"), false);

        // 删除非空目录
        fs.delete(new Path("jinguo"),true);
    }
    // 文件的更名和移动
    @Test
    public void testmv() throws IOException {
    
    
        // 参数解读:参数1 :原文件路径; 参数2 :目标文件路径
        // 对文件名称的修改
        //fs.rename(new Path("/input/word.txt"), new Path("/input/ss.txt"));

        // 文件的移动和更名
        //fs.rename(new Path("/input/ss.txt"),new Path("/cls.txt"));

        // 目录更名
        fs.rename(new Path("/input"), new Path("/output"));

    }

    // 获取文件详细信息
    @Test
    public void fileDetail() throws IOException {
    
    

        // 获取所有文件信息
        RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);

        // 遍历文件
        while (listFiles.hasNext()) {
    
    
            LocatedFileStatus fileStatus = listFiles.next();

            System.out.println("==========" + fileStatus.getPath() + "=========");
            System.out.println(fileStatus.getPermission());
            System.out.println(fileStatus.getOwner());
            System.out.println(fileStatus.getGroup());
            System.out.println(fileStatus.getLen());
            System.out.println(fileStatus.getModificationTime());
            System.out.println(fileStatus.getReplication());
            System.out.println(fileStatus.getBlockSize());
            System.out.println(fileStatus.getPath().getName());

            // 获取块信息
            BlockLocation[] blockLocations = fileStatus.getBlockLocations();

            System.out.println(Arrays.toString(blockLocations));

        }
    }

    // 判断是文件夹还是文件
    @Test
    public void testFile() throws IOException {
    
    

        FileStatus[] listStatus = fs.listStatus(new Path("/"));

        for (FileStatus status : listStatus) {
    
    

            if (status.isFile()) {
    
    
                System.out.println("文件:" + status.getPath().getName());
            } else {
    
    
                System.out.println("目录:" + status.getPath().getName());
            }
        }
    }

}

4. The reading and writing process of HDFS (interview focus)

4.1 HDFS write data process

4.1.1 Analysis file writing

HDFS data writing process
insert image description here
(1) The client requests the NameNode to upload files through the Distributed FileSystem module, and the NameNode checks whether the target file exists and whether the parent directory exists.
(2) NameNode returns whether it can be uploaded.
(3) Which DataNode servers the client requests to upload the first Block to.
(4) NameNode returns three DataNode nodes, namely dn1, dn2, and dn3.
(5) The client requests dn1 to upload data through the FSDataOutputStream module. After receiving the request, dn1 will continue to call dn2, and then dn2 will call dn3 to complete the establishment of the communication channel.
(6) dn1, dn2, and dn3 respond to clients step by step.
(7) The client starts to upload the first block to dn1 (first read the data from the disk and put it in a local memory cache), taking Packet as the unit, dn1 receives a Packet and then passes it to dn2, and dn2 passes it to dn3; dn1 Every time a packet is transmitted, it will be put into a response queue to wait for the response.
(8) After a Block transmission is completed, the client requests the NameNode to upload the server of the second Block again. (Repeat steps 3-7).

4.1.2 Network topology - node distance calculation

During the process of writing data in HDFS, NameNode will select the DataNode with the closest distance to the data to be uploaded to receive the data. So how to calculate the shortest distance?
Node distance: the sum of the distances between two nodes to their nearest common ancestor.

Distance(/d1/r1/n0,/d1/r1/n0)=0 (processes on the same node)
Distance(/d1/r2/n0,/d1/r3/n2)=4 (different racks in the same data center nodes on the same rack)
Distance(/d1/r1/n1,/d1/r1/n2)=2 (different nodes on the same rack)
Distance(/d1/r2/n1,/d2/r4/n1)=6( nodes in different data centers)

insert image description here
For example, assume there is node n1 in rack r1 in data center d1. This node can be represented as /d1/r1/n1. Using this notation, four distance descriptions are given here.
Calculate the distance between every two nodes.
insert image description here

4.1.3 Rack awareness (replication storage node selection)

1) Rack Aware Description
(1) Official Description
Link: http://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Replication .

For the common case, when the replication factor is three, HDFS’s
placement policy is to put one replica on the local machine if the writer
is on a datanode, otherwise on a random datanode, another replica on a
node in a different (remote) rack, and the last on a different node in
the same remote rack. This policy cuts the inter-rack write traffic which
generally improves write performance. The chance of rack failure is far
less than that of node failure; this policy does not impact data
reliability and availability guarantees. However, it does reduce the
aggregate network bandwidth used when reading data since a block is
placed in only two unique racks rather than three. With this policy, the
replicas of a file do not evenly distribute across the racks. One third
of replicas are on one node, two thirds of replicas are on one rack, and
the other third are evenly distributed across the remaining racks. This
policy improves write performance without compromising data reliability
or read performance.

(2) Source code description
Crtl + n to find BlockPlacementPolicyDefault, find the chooseTargetInOrder method in this class.
2) Hadoop3.1.3 replica node selection
insert image description here

4.2 HDFS read data process

insert image description here
(1) The client requests the NameNode to download the file through the DistributedFileSystem, and the NameNode finds the DataNode address where the file block is located by querying the metadata.
(2) Select a DataNode (proximity principle, then random) server and request to read data.
(3) DataNode starts to transmit data to the client (read the data input stream from the disk, and check in units of Packet).
(4) The client receives packets in units of packets, first caches them locally, and then writes them to the target file.

Five, NameNode and SecondaryNameNode

5.1 Working mechanism of NN and 2NN

Thinking: Where is the metadata in the NameNode stored?

First of all, let's make an assumption that if it is stored in the disk of the NameNode node, because it often needs random access and responds to customer requests, it must be too inefficient. Therefore, metadata needs to be stored in memory. But if it only exists in memory, once the power is turned off, the metadata will be lost, and the entire cluster will not work. Thus produces a FsImage that backs up the metadata on disk.

This will bring new problems. When the metadata in the memory is updated, if the FsImage is updated at the same time, the efficiency will be too low, but if it is not updated, the consistency problem will occur. Once the NameNode node is powered off, the Data loss will occur. Therefore, the Edits file is introduced (only append operations are performed, which is very efficient). Whenever metadata is updated or metadata is added, the metadata in memory is modified and appended to Edits. In this way, once the NameNode node is powered off, the metadata can be synthesized through the combination of FsImage and Edits.

However, if you add data to Edits for a long time, the file data will be too large and the efficiency will be reduced, and once the power is cut off, it will take too long to restore the metadata. Therefore, it is necessary to periodically merge FsImage and Edits. If this operation is completed by the NameNode node, the efficiency will be too low. Therefore, a new node SecondaryNamenode is introduced, which is specially used for merging FsImage and Edits.
insert image description here
1) The first stage: NameNode start
(1) After starting the NameNode for the first time and formatting, create Fsimage and Edits files. If it is not the first time to start, directly load the edit log and mirror file to memory.
(2) The client requests to add, delete, or modify metadata.
(3) NameNode records the operation log and updates the rolling log.
(4) NameNode adds, deletes, and modifies metadata in memory.
2) The second stage: Secondary NameNode work
(1) Secondary NameNode asks NameNode whether it needs CheckPoint. Whether to directly bring back the NameNode to check the result.
(2) Secondary NameNode requests to execute CheckPoint.
(3) NameNode scrolls the Edits log being written.
(4) Copy the edit log and mirror file before rolling to the Secondary NameNode.
(5) Secondary NameNode loads the edit log and mirror file into memory and merges them.
(6) Generate a new image file fsimage.chkpoint.
(7) Copy fsimage.chkpoint to NameNode.
(8) NameNode renames fsimage.chkpoint to fsimage.

5.2 Analysis of Fsimage and Edits

After the Fsimage and Edits concept
NameNode is formatted, the following files will be generated in the /opt/module/hadoop3.1.3/data/tmp/dfs/name/current directory

fsimage_0000000000000000000
fsimage_0000000000000000000.md5
seen_txid
VERSION

(1) Fsimage file: A permanent checkpoint of HDFS file system metadata, which contains serialization information of all directories and file inodes of HDFS file system.
(2) Edits file: the path to store all update operations of the HDFS file system. All write operations performed by the file system client will first be recorded in the Edits file.
(3) The seen_txid file saves a number, which is the number of the last edits_
(4) Every time the NameNode starts, the Fsimage file will be read into the memory, and the update operation in Edits will be loaded to ensure that the metadata information in the memory is The latest and synchronized ones can be seen as merging the Fsimage and Edits files when the NameNode starts.

1) oiv view Fsimage file
(1) view oiv and oev commands

[xusheng@hadoop102 current]$ hdfs
oiv      apply the offline fsimage viewer to an fsimage
oev      apply the offline edits viewer to an edits file

(2) Basic grammar

hdfs oiv -p 文件类型 -i 镜像文件 -o 转换后文件输出路径

(3) Case Practice

[xusheng@hadoop102 current]$ pwd
/opt/module/hadoop-3.1.3/data/dfs/name/current
[xusheng@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000025 -o /opt/module/hadoop-3.1.3/fsimage.xml
[xusheng@hadoop102 current]$ cat /opt/module/hadoop-3.1.3/fsimage.xml

insert image description here
insert image description here

Copy the content of the displayed xml file to the xml file created in Idea and format it. Some of the displayed results are as follows.

<inode>
	<id>16386</id>
	<type>DIRECTORY</type>
	<name>user</name>
	<mtime>1512722284477</mtime>
	<permission>xusheng:supergroup:rwxr-xr-x</permission>
	<nsquota>-1</nsquota>
	<dsquota>-1</dsquota>
</inode>
<inode>
	<id>16387</id>
	<type>DIRECTORY</type>
	<name>xusheng</name>
	<mtime>1512790549080</mtime>
	<permission>xusheng:supergroup:rwxr-xr-x</permission>
	<nsquota>-1</nsquota>
	<dsquota>-1</dsquota>
</inode>
<inode>
	<id>16389</id>
	<type>FILE</type>
	<name>wc.input</name>
	<replication>3</replication>
	<mtime>1512722322219</mtime>
	<atime>1512722321610</atime>
	<perferredBlockSize>134217728</perferredBlockSize>
	<permission>xusheng:supergroup:rw-r--r--</permission>
	<blocks>
		<block>
			<id>1073741825</id>
			<genstamp>1001</genstamp>
			<numBytes>59</numBytes>
		</block>
	</blocks>
</inode >

Thinking: It can be seen that there is no DataNode corresponding to the record block in Fsimage, why?
After the cluster starts, the DataNode is required to report the data block information, and to report again after a period of time.
2) oev View Edits file
(1) Basic syntax

hdfs oev -p 文件类型 -i 编辑日志 -o 转换后文件输出路径

(2) Case Practice

[xusheng@hadoop102 current]$ hdfs oev -p XML -i edits_0000000000000000012-0000000000000000013 -o /opt/module/hadoop-3.1.3/edits.xml
[xusheng@hadoop102 current]$ cat /opt/module/hadoop-3.1.3/edits.xml

insert image description here
insert image description here

Copy the content of the displayed xml file to the xml file created in Idea and format it. The displayed results are as follows.

<?xml version="1.0" encoding="UTF-8"?>
<EDITS>
	<EDITS_VERSION>-63</EDITS_VERSION>
	<RECORD>
		<OPCODE>OP_START_LOG_SEGMENT</OPCODE>
		<DATA>
			<TXID>129</TXID>
		</DATA>
	</RECORD>
	<RECORD>
		<OPCODE>OP_ADD</OPCODE>
		<DATA>
			<TXID>130</TXID>
			<LENGTH>0</LENGTH>
			<INODEID>16407</INODEID>
			<PATH>/hello7.txt</PATH>
			<REPLICATION>2</REPLICATION>
			<MTIME>1512943607866</MTIME>
			<ATIME>1512943607866</ATIME>
			<BLOCKSIZE>134217728</BLOCKSIZE>
			<CLIENT_NAME>DFSClient_NONMAPREDUCE_-1544295051_1</CLIENT_NAME>
			<CLIENT_MACHINE>192.168.10.102</CLIENT_MACHINE>
			<OVERWRITE>true</OVERWRITE>
			<PERMISSION_STATUS>
				<USERNAME>xusheng</USERNAME>
				<GROUPNAME>supergroup</GROUPNAME>
				<MODE>420</MODE>
			</PERMISSION_STATUS>
			<RPC_CLIENTID>908eafd4-9aec-4288-96f1-e8011d181561</RPC_CLIENTID>
			<RPC_CALLID>0</RPC_CALLID>
		</DATA>
	</RECORD>
	<RECORD>
		<OPCODE>OP_ALLOCATE_BLOCK_ID</OPCODE>
		<DATA>
			<TXID>131</TXID>
			<BLOCK_ID>1073741839</BLOCK_ID>
		</DATA>
	</RECORD>
	<RECORD>
		<OPCODE>OP_SET_GENSTAMP_V2</OPCODE>
		<DATA>
			<TXID>132</TXID>
			<GENSTAMPV2>1016</GENSTAMPV2>
		</DATA>
	</RECORD>
	<RECORD>
		<OPCODE>OP_ADD_BLOCK</OPCODE>
		<DATA>
			<TXID>133</TXID>
			<PATH>/hello7.txt</PATH>
			<BLOCK>
				<BLOCK_ID>1073741839</BLOCK_ID>
				<NUM_BYTES>0</NUM_BYTES>
				<GENSTAMP>1016</GENSTAMP>
		</BLOCK>
		<RPC_CLIENTID></RPC_CLIENTID>
		<RPC_CALLID>-2</RPC_CALLID>
	</DATA>
	</RECORD>
	<RECORD>
		<OPCODE>OP_CLOSE</OPCODE>
		<DATA>
			<TXID>134</TXID>
			<LENGTH>0</LENGTH>
			<INODEID>0</INODEID>
			<PATH>/hello7.txt</PATH>
			<REPLICATION>2</REPLICATION>
			<MTIME>1512943608761</MTIME>
			<ATIME>1512943607866</ATIME>
			<BLOCKSIZE>134217728</BLOCKSIZE>
			<CLIENT_NAME></CLIENT_NAME>
			<CLIENT_MACHINE></CLIENT_MACHINE>
			<OVERWRITE>false</OVERWRITE>
			<BLOCK>
				<BLOCK_ID>1073741839</BLOCK_ID>
				<NUM_BYTES>25</NUM_BYTES>
				<GENSTAMP>1016</GENSTAMP>
			</BLOCK>
			<PERMISSION_STATUS>
				<USERNAME>xusheng</USERNAME>
				<GROUPNAME>supergroup</GROUPNAME>
				<MODE>420</MODE>
			</PERMISSION_STATUS>
		</DATA>
	</RECORD>
</EDITS >

Thinking: How does the NameNode determine which Edits to merge at the next startup?

5.3 CheckPoint time setting

1) Normally, SecondaryNameNode executes every hour.
[hdfs-default.xml]

<property>
	<name>dfs.namenode.checkpoint.period</name>
	<value>3600s</value>
</property>

2) The number of operations is checked once a minute, and when the number of operations reaches 1 million, SecondaryNameNode executes it once.

<property>
	<name>dfs.namenode.checkpoint.txns</name>
	<value>1000000</value>
<description>操作动作次数</description>
</property>
<property>
	<name>dfs.namenode.checkpoint.check.period</name>
	<value>60s</value>
<description> 1 分钟检查一次操作次数</description>
</property>

6. DataNodes

6.1 DataNode working mechanism

insert image description here

(1) A data block is stored on the disk in the form of a file on the DataNode, including two files, one is the data itself, and the other is metadata including the length of the data block, the checksum of the block data, and the timestamp.
(2) DataNode registers with NameNode after startup, and reports all block information to NameNode periodically (6 hours) after passing.

The time interval for DN to report the current interpretation information to NN, the default is 6 hours;

<property>
	<name>dfs.blockreport.intervalMsec</name>
	<value>21600000</value>
	<description>Determines block reporting interval in
milliseconds.</description>
</property>

The time for DN to scan its own node block information list, the default is 6 hours

<property>
 	<name>dfs.datanode.directoryscan.interval</name>
	<value>21600s</value>
	<description>Interval in seconds for Datanode to scan data
directories and reconcile the difference between blocks in memory and on the disk.
	Support multiple time unit suffix(case insensitive), as described  in dfs.heartbeat.interval.
	</description>
</property>

(3) The heartbeat is once every 3 seconds, and the heartbeat returns results with commands from the NameNode to the DataNode
, such as copying block data to another machine, or deleting a certain data block. If the heartbeat of a DataNode is not received for more than 10 minutes, the node is considered unavailable.
(4) It is safe to join and exit some machines during cluster operation.

6.2 Data Integrity

Thinking: If the data stored in the computer disk is the red light signal (1) and the green light signal (0) that control the high-speed rail signal light, but the disk storing the data is broken and the green light is always displayed, is it dangerous? Similarly, if the data on the DataNode node is damaged, but it is not found, is it also dangerous, so how to solve it?
The following is the method of DataNode node to ensure data integrity.
(1) When DataNode reads Block, it will calculate CheckSum.
(2) If the calculated CheckSum is different from the value when the Block was created, it means that the Block has been damaged.
(3) Client reads Blocks on other DataNodes.
(4) Common verification algorithms crc (32), md5 (128), sha1 (160)
(5) DataNode periodically verifies CheckSum after its file is created.
insert image description here

6.3 Parameter setting of disconnection time limit

DataNode offline time limit parameter setting
insert image description here
1. DataNode process death or network failure causes DataNode to fail to communicate with NameNode
2. NameNode will not immediately judge the node as dead, it will take a period of time, this period is
temporarily called timeout period.
3. The default timeout period of HDFS is 10 minutes + 30 seconds.
4. If the timeout period is defined as TimeOut, the formula for calculating the timeout period is:

TimeOut = 2 * dfs.namenode.heartbeat.recheck-interval + 10 *
dfs.heartbeat.interval。

The default dfs.namenode.heartbeat.recheck-interval is 5 minutes, and dfs.heartbeat.interval is 3 seconds by default.
Note that the unit of heartbeat.recheck.interval in the hdfs-site.xml configuration file is milliseconds, and the unit of dfs.heartbeat.interval is seconds .

<property>
	<name>dfs.namenode.heartbeat.recheck-interval</name>
	<value>300000</value>
</property>
<property>
	<name>dfs.heartbeat.interval</name>
	<value>3</value>
</property>

Guess you like

Origin blog.csdn.net/m0_52435951/article/details/124051951