table of Contents
Two, hdfs components and their functions
Third, the data block hdfs (Block)
Five, matters needing attention when developing hdfs applications in java
Six, the role of DataNode heartbeat mechanism
Seven, the role of EditsLog and FSImage in NameNode
8. SecondaryNameNode helps NameNode reduce the burden
Ten, create a file snapshot (backup) command
Preface
After more than a year, I have been busy doing business development on the java web side. I have basically forgotten about big data. This time I published a series of big data tutorial blog posts to pick it up.
Download, install and start of hadoop, hdfs, stop here, I won’t introduce them one by one, you can check my history blog if you don’t.
By default, we have built a three-node master-slave hadoop node, one master, and two salves
1. Use http to access hdfs
Add the following configuration in hdfs-site.xml, and then restart hdfs:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<description>使得可以使用http的方式访问hdfs</description>
</property>
Use http access:
Query the error.txt file in the user/hadoop-twq/cmd file system
http://master:50070/webhdfs/v1/user/hadoop-twq/cmd/error.txt?op=LISTSTATUS
http://master:50070/webhdfs/v1/user/hadoop-twq/cmd/error.txt?op=OPEN
Supported op see:
http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
Two, hdfs components and their functions
Three, the data block in hdfs (Block)
The default size of the data block: 128M
Set the size of the data block as: 256M = 256*1024*1024
Add configuration in ${HADOOP_HOME}/ect/hadoop/hdfs-site.xml:
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
The default number of backups for data blocks is 3
Set the number of data block backups
- Set the number of backups for the specified file
hadoop fs -setrep 2 /user/hadoop-twq/cmd/big_file.txt
- The number of global file backups is directly set in hdfs-site:
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
Data blocks are stored in the local disk file of the machine where each datanode is located
Four, java api operation hdfs
Use javaapi to write data to file
package com.dzx.hadoopdemo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.nio.charset.StandardCharsets;
/**
* @author duanzhaoxu
* @ClassName:
* @Description:
* @date 2020年12月17日 17:35:37
*/
public class hdfs {
public static void main(String[] args) throws Exception {
String content = "this is a example";
String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.get(URI.create(dest), configuration);
FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(dest));
fsDataOutputStream.write(content.getBytes(StandardCharsets.UTF_8));
fsDataOutputStream.close();
}
}
Use java api to read files
package com.dzx.hadoopdemo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URI;
/**
* @author duanzhaoxu
* @ClassName:
* @Description:
* @date 2020年12月17日 17:35:37
*/
public class hdfs {
public static void main(String[] args) throws Exception {
String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.get(URI.create(dest), configuration);
FSDataInputStream fsDataInputStream = fileSystem.open(new Path(dest));
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fsDataInputStream));
String line = null;
while (bufferedReader.readLine() != null) {
System.out.println(line);
}
fsDataInputStream.close();
bufferedReader.close();
}
}
Use java api to get file status information
package com.dzx.hadoopdemo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.permission.FsAction;
import org.apache.hadoop.fs.permission.FsPermission;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URI;
/**
* @author duanzhaoxu
* @ClassName:
* @Description:
* @date 2020年12月17日 17:35:37
*/
public class hdfs {
public static void main(String[] args) throws Exception {
//获取指定文件的文件状态信息
String dest = "hdfs://master:9999/user/hadoop-twq/cmd/java_writer.txt";
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.get(URI.create("hdfs://master:9999/"), configuration);
FileStatus fileStatus = fileSystem.getFileStatus(new Path(dest));
System.out.println(fileStatus.getPath());
System.out.println(fileStatus.getAccessTime());
System.out.println(fileStatus.getBlockSize());
System.out.println(fileStatus.getGroup());
System.out.println(fileStatus.getLen());
System.out.println(fileStatus.getModificationTime());
System.out.println(fileStatus.getOwner());
System.out.println(fileStatus.getPermission());
System.out.println(fileStatus.getReplication());
System.out.println(fileStatus.getSymlink());
//获取指定目录下的所有文件的文件状态信息
FileStatus[] fileStatuses = fileSystem.listStatus(new Path("hdfs://master:9999/user/hadoop-twq/cmd"));
for (FileStatus status : fileStatuses) {
System.out.println(status.getPath());
System.out.println(status.getAccessTime());
System.out.println(status.getBlockSize());
System.out.println(status.getGroup());
System.out.println(status.getLen());
System.out.println(status.getModificationTime());
System.out.println(status.getOwner());
System.out.println(status.getPermission());
System.out.println(status.getReplication());
System.out.println(status.getSymlink());
}
//创建目录
fileSystem.mkdirs(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java"));
//创建目录并指定权限 rwx--x---
fileSystem.mkdirs(new Path("hdfs://master:9999/user/hadoop-twq/cmd/temp"), new FsPermission(FsAction.ALL, FsAction.EXECUTE, FsAction.NONE));
//删除指定文件
fileSystem.delete(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java/1.txt"), false);
//删除指定目录
fileSystem.delete(new Path("hdfs://master:9999/user/hadoop-twq/cmd/java"), true);
}
}
Five, matters needing attention when developing hdfs applications in java
//需要把core-site.xml文件放到resources目录下,自动读取hdfs的ip端口配置
String dest = "user/hadoop-twq/cmd/java_writer.txt";
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.get(configuration);
FileStatus fileStatus = fileSystem.getFileStatus(new Path(dest));
Six, the role of DataNode heartbeat mechanism
Seven, the role of EditsLog and FSImage in NameNode
8. SecondaryNameNode helps NameNode reduce the burden
9. How to expand NameNode?
Add the following configuration to the hdfs-site.xml of the master node master
View the clusterId of the master node
Copy hdfs-site.xml from the master node master to slave1 and slave2
After forming a three-node cluster, we use java api and we don’t know which nameNode’s ip: port to specify, so we need to configure viewFs
First comment out the fs.defaultFS configuration item in core-site.xml, and then add the following configuration
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="mountTable.xml"/>
<property>
<name>fs.default.name</name>
<value>viewfs://my-cluster</value>
</property>
</configuration>
Then add a mountTable.xml file (metadata management distribution mapping, which is equivalent to distributing the metadata managed by namenode to different namenode nodes)
Then synchronize the modified configuration file to slave1 and slave2, and restart the hdfs cluster
The general request method can be used on any node after restart, for example:
hadoop fs -ls viewfs:://my-cluster/
Ten, create a file snapshot (backup) command
Authorize the creation of a snapshot for the specified directory
hadoop dfsadmin -allowSnapshot /user/hadoop-twq/data
Create a snapshot
hadoop fs -createSnapshot /user/hadoop-twq/data data-20180317-snapshot
View the created snapshot file
hadoop fs -ls /user/hadoop-twq/data/.snapshot/data-20180317-snapshot
Other snapshot related commands
11. Balance data
When we expand the hdfs cluster, it is inevitable that the newly expanded nodes will have less data allocated. At this time, in order to distribute the data in a balanced way, you can use the hdfs balancer command
Twelve, safemode safe mode
Cannot create or delete directories and files after opening safe mode, only allow to view directories and files
hadoop dfsadmin -safemode get
Safe mode is OFF
hadoop dfsadmin -safemode enter
Safe mode is ON
hadoop dfsadmin -safemode leave
Safe mode is ON