hadoop archive small files

hadoop small file archives
1.HDFS archive small files drawbacks of
each file are by memory blocks, each block of metadata stored in the memory NameNode, so HDFS store small files will be very inefficient. Because a large number of small files NameNode will run out of most of the memory. It is noted that regardless of the size of the disk capacity and data blocks required to store small files. For example, a 1M file as block storage 128M, the actual use of your disk space is 1M.
2. One way to solve small files stored in
HDFS file saved document file or HAR, is a more efficient file archiving tool that will file into HDFS block, while reducing NameNode memory usage, allowing transparent access to the file. Specifically, HDFS archive internally and a a separate file, but it is in terms of NameNode a whole, reducing the NameNode memory
3. Case practical operation of
(1) the need to start the process YARN

[linyouyi@hadoop01 hadoop-2.7.7]$ sbin/start-yarn.sh

(2) archive file
to archive all files / user / linyouyi / input into the archive directory called input.har and the files to the archive storage / user / linyouyi / output path.

[linyouyi@hadoop01 hadoop-2.7.7]$ bin/hadoop archive -archiveName input.har –p /user/linyouyi/input /user/linyouyi/output

(3) View Archive

[linyouyi@hadoop01 hadoop-2.7.7]$ hadoop fs -lsr /user/linyouyi/output/input.har
[linyouyi@hadoop01 hadoop-2.7.7]$ hadoop fs -lsr har:///user/linyouyi/output/input.har

(4) archive solution

[linyouyi@hadoop01 hadoop-2.7.7]$ hadoop fs -cp har:///user/linyouyi/output/input.har/* /user/linyouyi

 

Guess you like

Origin www.cnblogs.com/linyouyi/p/11310572.html