"Attack on Big Data" series of tutorials on the basis of hadoop big data

table of Contents

         Preface

1. Use http to access hdfs

Two, hdfs components and their functions

Third, the data block hdfs (Block)

Four, java api operation hdfs

Five, matters needing attention when developing hdfs applications in java

Six, the role of DataNode heartbeat mechanism

Seven, the role of EditsLog and FSImage in NameNode

8. SecondaryNameNode helps NameNode reduce the burden

9. How to expand NameNode?

Ten, create a file snapshot (backup) command

11. Balance data

Twelve, safemode safe mode


Preface

After more than a year, I have been busy doing business development on the java web side. I have basically forgotten about big data. This time I published a series of big data tutorial blog posts to pick it up.

Download, install and start of hadoop, hdfs, stop here, I won’t introduce them one by one, you can check my history blog if you don’t.

By default, we have built a three-node master-slave hadoop node, one master, and two salves

1. Use http to access hdfs

Add the following configuration in hdfs-site.xml, and then restart hdfs:

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

<description>使得可以使用http的方式访问hdfs</description>

</property>

Use http access:

Query the error.txt file in the user/hadoop-twq/cmd file system

http://master:50070/webhdfs/v1/user/hadoop-twq/cmd/error.txt?op=LISTSTATUS

http://master:50070/webhdfs/v1/user/hadoop-twq/cmd/error.txt?op=OPEN

Supported op see:

http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

Two, hdfs components and their functions

Three, the data block in hdfs (Block)

The default size of the data block: 128M

Set the size of the data block as: 256M = 256*1024*1024

Add configuration in ${HADOOP_HOME}/ect/hadoop/hdfs-site.xml:

<property>

<name>dfs.block.size</name>

<value>268435456</value>

</property>

The default number of backups for data blocks is 3

Set the number of data block backups

  • Set the number of backups for the specified file

hadoop  fs  -setrep  2 /user/hadoop-twq/cmd/big_file.txt

  • The number of global file backups is directly set in hdfs-site:
<property>

   <name>dfs.replication</name>

 <value>3</value>

</property>

Data blocks are stored in the local disk file of the machine where each datanode is located

Four, java api operation hdfs

Use javaapi to write data to file

package com.dzx.hadoopdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.nio.charset.StandardCharsets;

/**
 * @author duanzhaoxu
 * @ClassName:
 * @Description:
 * @date 2020年12月17日 17:35:37
 */
public class hdfs {
    public static void main(String[] args) throws Exception {
        String content = "this is a example";
        String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";
        Configuration configuration = new Configuration();
        FileSystem fileSystem = FileSystem.get(URI.create(dest), configuration);
        FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(dest));
        fsDataOutputStream.write(content.getBytes(StandardCharsets.UTF_8));
        fsDataOutputStream.close();
    }
}

Use java api to read files

package com.dzx.hadoopdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URI;

/**
 * @author duanzhaoxu
 * @ClassName:
 * @Description:
 * @date 2020年12月17日 17:35:37
 */
public class hdfs {
    public static void main(String[] args) throws Exception {
        String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";
        Configuration configuration = new Configuration();
        FileSystem fileSystem = FileSystem.get(URI.create(dest), configuration);
        FSDataInputStream fsDataInputStream = fileSystem.open(new Path(dest));
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fsDataInputStream));
        String line = null;
        while (bufferedReader.readLine() != null) {
            System.out.println(line);
        }
        fsDataInputStream.close();
        bufferedReader.close();
    }
}

 

Use java api to get file status information

package com.dzx.hadoopdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.permission.FsAction;
import org.apache.hadoop.fs.permission.FsPermission;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URI;

/**
 * @author duanzhaoxu
 * @ClassName:
 * @Description:
 * @date 2020年12月17日 17:35:37
 */
public class hdfs {
    public static void main(String[] args) throws Exception {
        //获取指定文件的文件状态信息
        String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";
        Configuration configuration = new Configuration();
        FileSystem fileSystem = FileSystem.get(URI.create("hdfs://master:9999/"), configuration);
        FileStatus fileStatus = fileSystem.getFileStatus(new Path(dest));

        System.out.println(fileStatus.getPath());
        System.out.println(fileStatus.getAccessTime());
        System.out.println(fileStatus.getBlockSize());
        System.out.println(fileStatus.getGroup());
        System.out.println(fileStatus.getLen());
        System.out.println(fileStatus.getModificationTime());
        System.out.println(fileStatus.getOwner());
        System.out.println(fileStatus.getPermission());
        System.out.println(fileStatus.getReplication());
        System.out.println(fileStatus.getSymlink());

        //获取指定目录下的所有文件的文件状态信息
        FileStatus[] fileStatuses = fileSystem.listStatus(new Path("hdfs://master:9999/user/hadoop-twq/cmd"));
        for (FileStatus status : fileStatuses) {
            System.out.println(status.getPath());
            System.out.println(status.getAccessTime());
            System.out.println(status.getBlockSize());
            System.out.println(status.getGroup());
            System.out.println(status.getLen());
            System.out.println(status.getModificationTime());
            System.out.println(status.getOwner());
            System.out.println(status.getPermission());
            System.out.println(status.getReplication());
            System.out.println(status.getSymlink());
        }

        //创建目录
        fileSystem.mkdirs(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java"));
        //创建目录并指定权限  rwx--x---
        fileSystem.mkdirs(new Path("hdfs://master:9999/user/hadoop-twq/cmd/temp"), new FsPermission(FsAction.ALL, FsAction.EXECUTE, FsAction.NONE));

        //删除指定文件
        fileSystem.delete(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java/1.txt"), false);
        //删除指定目录
        fileSystem.delete(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java"), true);

    }
}

Five, matters needing attention when developing hdfs applications in java

 //需要把core-site.xml文件放到resources目录下,自动读取hdfs的ip端口配置
        String dest = "user/hadoop-twq/cmd/java_writer.txt";
        Configuration configuration = new Configuration();
        FileSystem fileSystem = FileSystem.get(configuration);
        FileStatus fileStatus = fileSystem.getFileStatus(new Path(dest));

Six, the role of DataNode heartbeat mechanism

Seven, the role of EditsLog and FSImage in NameNode

8. SecondaryNameNode helps NameNode reduce the burden

9. How to expand NameNode?

Add the following configuration to the hdfs-site.xml of the master node master 

 

View the clusterId of the master node 

Copy hdfs-site.xml from the master node master to slave1 and slave2 

After forming a three-node cluster, we use java api and we don’t know which nameNode’s ip: port to specify, so we need to configure viewFs

First comment out the fs.defaultFS configuration item in core-site.xml, and then add the following configuration

<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
		<xi:include href="mountTable.xml"/>
        <property>
            <name>fs.default.name</name>
            <value>viewfs://my-cluster</value>        
        </property>		
</configuration>

Then add a mountTable.xml file (metadata management distribution mapping, which is equivalent to distributing the metadata managed by namenode to different namenode nodes)

Then synchronize the modified configuration file to slave1 and slave2, and restart the hdfs cluster

The general request method can be used on any node after restart, for example:

hadoop  fs  -ls  viewfs:://my-cluster/

Ten, create a file snapshot (backup) command

Authorize the creation of a snapshot for the specified directory

hadoop dfsadmin -allowSnapshot  /user/hadoop-twq/data

Create a snapshot

hadoop fs -createSnapshot /user/hadoop-twq/data  data-20180317-snapshot

View the created snapshot file

hadoop fs -ls /user/hadoop-twq/data/.snapshot/data-20180317-snapshot

Other snapshot related commands

11. Balance data

When we expand the hdfs cluster, it is inevitable that the newly expanded nodes will have less data allocated. At this time, in order to distribute the data in a balanced way, you can use the hdfs balancer command

Twelve, safemode safe mode

Cannot create or delete directories and files after opening safe mode, only allow to view directories and files

hadoop  dfsadmin  -safemode  get

Safe  mode is OFF

hadoop dfsadmin -safemode enter 

Safe  mode is ON

hadoop dfsadmin -safemode leave

Safe  mode is ON

Guess you like

Origin blog.csdn.net/qq_31905135/article/details/111317591