advantage
Wherein the 10k +, referring to each of the blocks must be> = 1M
Shortcoming
Low latency: means hadoop data processing units are in minutes, rather than as a storm is in units of milliseconds.
High Throughput: refers to the size of your file block distributed storage must be minimized is 1M, can not be small.
Small file access problem: If 200 million documents, although very large, but each file is very small, so each one will still consume NameNode memory, so this time is not conducive to NameNode, so when the files are very small when the treatment is not suitable for use hadoop.
The data is written: only supports write-once, late can not be changed, but you can append the last block, the data can be read many times.