Hadoop Distributed File System HDFS for Big Data

1. Introduction

HDFS (Hadoop Distributed File System) is a distributed file system under Hadoop. It has the characteristics of high fault tolerance and high throughput, and can be deployed on low-cost hardware.

2. HDFS design principle

insert image description here

3. Principle graphic introduction

insert image description here
insert image description here
insert image description here
insert image description here
Schematic diagram of reading data:
insert image description here
The three major components of Hadoop: HDFS (distributed storage system), YARN (resource manager), and MapReduce (distributed computing framework).

After Hadoop is installed and configured, it comes with three major component
reference links

4. Commonly used shell commands for HDFS

  1. Show current directory structure
# 显示当前目录结构
hadoop fs -ls  <path>


# 递归显示当前目录结构
hadoop fs -ls  -R  <path>
# 显示根目录下内容
hadoop fs -ls  /
  1. Create a directory
# 创建目录
hadoop fs -mkdir  <path> 
# 递归创建目录
hadoop fs -mkdir -p  <path>  
  1. delete operation
# 删除文件
hadoop fs -rm  <path>
# 递归删除目录和文件
hadoop fs -rm -R  <path> 
  1. Load files from local to HDFS
# 二选一执行即可
hadoop fs -put  [localsrc] [dst] 
hadoop fs - copyFromLocal [localsrc] [dst] 
  1. Export files from HDFS to local
# 二选一执行即可
hadoop fs -get  [dst] [localsrc] 
hadoop fs -copyToLocal [dst] [localsrc] 
  1. view file content
# 二选一执行即可
hadoop fs -text  <path> 
hadoop fs -cat  <path>  
  1. Display the last kilobyte of the file
hadoop fs -tail  <path> 
# 和Linux下一样,会持续监听文件内容变化 并显示文件的最后一千字节
hadoop fs -tail -f  <path> 
  1. copy files
hadoop fs -cp [src] [dst]
  1. move files
hadoop fs -mv [src] [dst] 
  1. Count the size of each file in the current directory
默认单位字节
-s : 显示所有文件大小总和,
-h : 将以更友好的方式显示文件大小(例如 64.0m 而不是 67108864)
hadoop fs -du  <path>  
  1. Merge and download multiple files
-nl 在每个文件的末尾添加换行符(LF)
-skip-empty-file 跳过空文件
hadoop fs -getmerge
# 示例 将HDFS上的hbase-policy.xml和hbase-site.xml文件合并后下载到本地的/usr/test.xml
hadoop fs -getmerge -nl  /test/hbase-policy.xml /test/hbase-site.xml /usr/test.xml
  1. Statistical file system free space information
hadoop fs -df -h /
  1. Change the file replication factor
hadoop fs -setrep [-R] [-w] <numReplicas> <path>
更改文件的复制因子。如果 path 是目录,则更改其下所有文件的复制因子
-w : 请求命令是否等待复制完成
# 示例
hadoop fs -setrep -w 3 /user/hadoop/dir1
  1. access control
# 权限控制和Linux上使用方式一致
# 变更文件或目录的所属群组。 用户必须是文件的所有者或超级用户。
hadoop fs -chgrp [-R] GROUP URI [URI ...]
# 修改文件或目录的访问权限  用户必须是文件的所有者或超级用户。
hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]
# 修改文件的拥有者  用户必须是超级用户。
hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
  1. file detection
hadoop fs -test - [defsz]  URI
可选选项:

-d:如果路径是目录,返回 0。
-e:如果路径存在,则返回 0。
-f:如果路径是文件,则返回 0。
-s:如果路径不为空,则返回 0。
-r:如果路径存在且授予读权限,则返回 0。
-w:如果路径存在且授予写入权限,则返回 0。
-z:如果文件长度为零,则返回 0。
# 示例
hadoop fs -test -e filename

5. HDFS related Java API

direct reference address

Guess you like

Origin blog.csdn.net/zouyang920/article/details/130389620