fsck command in HDFS (health check data block)

In HDFS provides fsck command to check on the health status of HDFS files and directories, access to block information and location information files and so on.

We executed on the master machine hdfs fsck, you can see the usage of this command.

[hadoop-twq@master ~]$ hdfs fsck
Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots] [-storagepolicies] [-blockId <blk_Id>]
	<path>	start checking from this path
	-move	move corrupted files to /lost+found
	-delete	delete corrupted files
	-files	print out files being checked
	-openforwrite	print out files opened for write
	-includeSnapshots	include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
	-list-corruptfileblocks	print out list of missing blocks and files they belong to
	-blocks	print out block report
	-locations	print out locations for every block
	-racks	print out network topology for data-node locations
	-storagepolicies	print out storage policy summary for the blocks
	-blockId	print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

 

Health information view the file directory

Execute the following command:

hdfs fsck /user/hadoop-twq/cmd

You can view /user/hadoop-twq/cmdhealth information directories:

 

 

 There is a more important information is Corrupt blocksthe number of data blocks represents corrupted

See corrupted file block (-list-corruptfileblocks)

[hadoop-twq@master ~]$ hdfs fsck /user/hadoop-twq/cmd -list-corruptfileblocks
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2Fuser%2Fhadoop-twq%2Fcmd
The filesystem under path '/user/hadoop-twq/cmd' has 0 CORRUPT files

The above command can find a directory of the damaged block, but on top of that there is no seeing bad block

Handle a damaged file

The damaged file moved to / lost + found directory (-move)

hdfs fsck /user/hadoop-twq/cmd -move

 File deletion damaged data block (-delete)

hdfs fsck /user/hadoop-twq/cmd -delete

Check the status and lists all files (-files)

Execute the following command:

hdfs fsck /user/hadoop-twq/cmd -files

 The results show the following:

 

 

 The above command can check information for all files in the specified directory, comprising: a backup when the number of data blocks and data blocks

Check and print the file being opened (-openforwrite) write operations

Execute the following command to check which files to specify the path following the write operation is being performed:

hdfs fsck /user/hadoop-twq/cmd -openforwrite

Block printed report file (-blocks)

 Execute the following command, you can view a file specified details of all the Block, needs and -files used with: 

hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks

  The results are as follows:

 

 If we add the above command -locations, then, it is said they still need print location information for each data block, the following command:

hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations

  The results are as follows:

 

 If we add the above command -racks, then, is said they still need to print information rack location of each data block is located, the following command:

hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations -racks

  The results are as follows:

 

 

hdfs fsck usage scenarios

scene one

When we execute the following command:

hdfs fsck /user/hadoop-twq/cmd

You can view /user/hadoop-twq/cmdhealth information directories:   

 

 We can see that there are insufficient data block number two backup files, which we can adopt the following commands, reset the backup data file number two blocks:

## backing up data blocks corresponding to the file big_file.txt. 1 
Hadoop FS -setrep /user/hadoop-twq/cmd/big_file.txt. 1 -w 
## backing up data blocks corresponding to the file parameter_test.txt 1 
hadoop FS -setrep -w 1 /user/hadoop-twq/cmd/parameter_test.txt

The above command -wparameter indicates the number of backups waiting to reach the specified number of backups, and then coupled with the implementation of this parameter, then take a long time  

After the completion of the implementation of the above command, let us execute the following command:

hdfs fsck /user/hadoop-twq/cmd

The results are as follows:

  

 

 

Scene II

When we visit the WEB UI HDFS, there has been a warning message as follows:

 

 It shows that there is a block of data is lost, and this time we execute the following command to determine which file is a block of data is lost:

[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F
The list of corrupt files under path '/' are:
blk_1073744153	/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml
The filesystem under path '/' has 1 CORRUPT files

  

Found that the data block is blk_1073744153lost, the data block is a text file Shu /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml's.

If this scenario occurs is because there is no data block in DataNode in, but there is information in the data block NameNode metadata, we can execute the following commands to delete these useless data block information, as follows:

[hadoop-twq@master ~]$ hdfs fsck /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/ -delete
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&delete=1&path=%2Ftmp%2Fhadoop-yarn%2Fstaging%2Fhistory%2Fdone_intermediate%2Fhadoop-twq
FSCK started by hadoop-twq (auth:SIMPLE) from /192.168.126.130 for path /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq at Tue Mar 05 19:18:00 EST 2019
....................................................................................................
..
/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: CORRUPT blockpool BP-1639452328-192.168.126.130-1525478508894 block blk_1073744153

/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: MISSING 1 blocks of total size 220262 B...................................................................................................
....................................................................................................
........................Status: CORRUPT
 Total size:	28418833 B
 Total dirs:	1
 Total files:	324
 Total symlinks:		0
 Total blocks (validated):	324 (avg. block size 87712 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:	1 (0.30864197 %)
  dfs.namenode.replication.min:	1
  CORRUPT FILES:	1
  MISSING BLOCKS:	1
  MISSING SIZE:		220262 B
  CORRUPT BLOCKS: 	1
  ********************************
 Minimally replicated blocks:	323 (99.69136 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	1
 Average block replication:	0.99691355
 Corrupt blocks:		1
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		2
 Number of racks:		1
FSCK ended at Tue Mar 05 19:18:01 EST 2019 in 215 milliseconds

  Then execute:

[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

  

No missing data block, it has been deleted. We can also refresh WEB UI, there is no warning message:

 

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11487899.html