Hadoop之Non DFS Used大小分析

Non DFS Used为非hadoop文件系统所使用的空间，比如说本身的linux系统使用的，或者存放的其它文件。

计算公式：Non DFS used=（总容量-预留空间）- 剩余容量 - DFS使用容量

具体计算过程：

Non DFS Used = Configured Capacity - Remaining Space - DFS Used
Configured Capacity = Total Disk Space - Reserved Space.
Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Let's take a example. Assuming I have 500 GB disk, and I set the reserved space (dfs.datanode.du.reserved) to 50 GB.
In the disk, the system and other files used up to 120 GB, DFS Used 100 GB. If you run df -h , you will see the available space is 280GB for that disk volume.
In HDFS web UI, it will show
Non DFS used = 500GB(Total) - 50 GB( Reserved) - 100 GB (DFS used) - 280GB(Remaining) = 70 GB

So it actually means, you initially configured to reserve 50G for non dfs usage, and 450 G for HDFS. However, it turns out non dfs usage exceeds the 50G reservation and eat up 100 GB space which should belongs to HDFS!
The term "Non DFS used" should really be renamed to something like "How much configured DFS capacity are occupied by non dfs use"
"Non DFS used" 应该解释为"配置的dfs的空间有多少空间被不是hdfs的文件占用了的"

One useful command is" lsof | grep delete", which will help you identify those open file which has been deleted. Sometimes, Hadoop processes (like hive, yarn, and mapred and hdfs) may hold reference to those already deleted files. And these references will occupy disk space.
使用命令lsof | grep delete 这将帮助你识别那些已被删除的文件
很多流程(像hive, yarn, and mapred and hdfs)可能引用那些已经删除文件。这些引用将占用磁盘空间。

可以使用 du -hsx * | sort -rh | head -10 帮助最大的十大文件夹列表

du -hsx * | sort -rh | head -10 helps list the top ten largest folders.

Hadoop之Non DFS Used大小分析

猜你喜欢