In HDFS provides fsck command to check on the health status of HDFS files and directories, access to block information and location information files and so on.
We executed on the master machine hdfs fsck
, you can see the usage of this command.
[hadoop-twq@master ~]$ hdfs fsck Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots] [-storagepolicies] [-blockId <blk_Id>] <path> start checking from this path -move move corrupted files to /lost+found -delete delete corrupted files -files print out files being checked -openforwrite print out files opened for write -includeSnapshots include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it -list-corruptfileblocks print out list of missing blocks and files they belong to -blocks print out block report -locations print out locations for every block -racks print out network topology for data-node locations -storagepolicies print out storage policy summary for the blocks -blockId print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)
Health information view the file directory
Execute the following command:
hdfs fsck /user/hadoop-twq/cmd
You can view/user/hadoop-twq/cmd
health information directories:
There is a more important information is Corrupt blocks
the number of data blocks represents corrupted
See corrupted file block (-list-corruptfileblocks)
[hadoop-twq@master ~]$ hdfs fsck /user/hadoop-twq/cmd -list-corruptfileblocks Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2Fuser%2Fhadoop-twq%2Fcmd The filesystem under path '/user/hadoop-twq/cmd' has 0 CORRUPT files
The above command can find a directory of the damaged block, but on top of that there is no seeing bad block
Handle a damaged file
The damaged file moved to / lost + found directory (-move)
hdfs fsck /user/hadoop-twq/cmd -move
File deletion damaged data block (-delete)
hdfs fsck /user/hadoop-twq/cmd -delete
Check the status and lists all files (-files)
Execute the following command:
hdfs fsck /user/hadoop-twq/cmd -files
The results show the following:
The above command can check information for all files in the specified directory, comprising: a backup when the number of data blocks and data blocks
Check and print the file being opened (-openforwrite) write operations
Execute the following command to check which files to specify the path following the write operation is being performed:
hdfs fsck /user/hadoop-twq/cmd -openforwrite
Block printed report file (-blocks)
Execute the following command, you can view a file specified details of all the Block, needs and -files used with:
hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks
The results are as follows:
If we add the above command -locations
, then, it is said they still need print location information for each data block, the following command:
hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations
The results are as follows:
If we add the above command -racks
, then, is said they still need to print information rack location of each data block is located, the following command:
hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations -racks
The results are as follows:
hdfs fsck usage scenarios
scene one
When we execute the following command:
hdfs fsck /user/hadoop-twq/cmd
You can view /user/hadoop-twq/cmd
health information directories:
We can see that there are insufficient data block number two backup files, which we can adopt the following commands, reset the backup data file number two blocks:
## backing up data blocks corresponding to the file big_file.txt. 1 Hadoop FS -setrep /user/hadoop-twq/cmd/big_file.txt. 1 -w ## backing up data blocks corresponding to the file parameter_test.txt 1 hadoop FS -setrep -w 1 /user/hadoop-twq/cmd/parameter_test.txt
The above command -w
parameter indicates the number of backups waiting to reach the specified number of backups, and then coupled with the implementation of this parameter, then take a long time
After the completion of the implementation of the above command, let us execute the following command:
hdfs fsck /user/hadoop-twq/cmd
The results are as follows:
Scene II
When we visit the WEB UI HDFS, there has been a warning message as follows:
It shows that there is a block of data is lost, and this time we execute the following command to determine which file is a block of data is lost:
[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F The list of corrupt files under path '/' are: blk_1073744153 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml The filesystem under path '/' has 1 CORRUPT files
Found that the data block is blk_1073744153
lost, the data block is a text file Shu /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml
's.
If this scenario occurs is because there is no data block in DataNode in, but there is information in the data block NameNode metadata, we can execute the following commands to delete these useless data block information, as follows:
[hadoop-twq@master ~]$ hdfs fsck /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/ -delete Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&delete=1&path=%2Ftmp%2Fhadoop-yarn%2Fstaging%2Fhistory%2Fdone_intermediate%2Fhadoop-twq FSCK started by hadoop-twq (auth:SIMPLE) from /192.168.126.130 for path /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq at Tue Mar 05 19:18:00 EST 2019 .................................................................................................... .. /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: CORRUPT blockpool BP-1639452328-192.168.126.130-1525478508894 block blk_1073744153 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: MISSING 1 blocks of total size 220262 B................................................................................................... .................................................................................................... ........................Status: CORRUPT Total size: 28418833 B Total dirs: 1 Total files: 324 Total symlinks: 0 Total blocks (validated): 324 (avg. block size 87712 B) ******************************** UNDER MIN REPL'D BLOCKS: 1 (0.30864197 %) dfs.namenode.replication.min: 1 CORRUPT FILES: 1 MISSING BLOCKS: 1 MISSING SIZE: 220262 B CORRUPT BLOCKS: 1 ******************************** Minimally replicated blocks: 323 (99.69136 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 0.99691355 Corrupt blocks: 1 Missing replicas: 0 (0.0 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Tue Mar 05 19:18:01 EST 2019 in 215 milliseconds
Then execute:
[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F The filesystem under path '/' has 0 CORRUPT files
No missing data block, it has been deleted. We can also refresh WEB UI, there is no warning message: