hbase、hadoop checksum相关

support checksums in HBase block cache
https://issues.apache.org/jira/browse/HBASE-5074

 
Store data and checksums together in block file
https://issues.apache.org/jira/browse/HDFS-2699

Skip checksum is broke; are we double-checksumming by default?

 
HDFS-2699里面的讨论比较清楚:现在的软件和硬件都是4096 byte为单位进行读写
facebook的 dhruba borthakur说他们的hbase生产系统都设置io.bytes.per.checksum为4096 (instead of 512)
facebook的相关设置是:the hbase block size is 16K. The hdfs checksum size is 4K. The hdfs block size is 256 MB.
ebay的hbase生产系统hdfs block size 设置的是128M,在其他地方看到的
HBASE-5074的特性是默认打开的,通过属性"hbase.regionserver.checksum.verify"设置
HRegionServer:
    // do we use checksum verfication in the hbase? If hbase checksum verification
    // is enabled, then we automatically switch off hdfs checksum verification.
    this.useHBaseChecksum = conf.getBoolean(
      HConstants.HBASE_CHECKSUM_VERIFICATION, true);
现在的cdh3u3对这个特性的支持还不完整:DFS Client端的verifyChecksum会设置为false,这样子DFS Client不会对读取到的Data和CheckSum进行校验。但是DataNode还是会读取Data和CheckSum,所以DataNode机器上面的两次iops还是不可避免,需要hdfs进行相应的修改。
HBase一个Block里面既有Data又有CheckSum,CheckSum和Data是连续存储的,所以只需要一次iops。

猜你喜欢

转载自bupt04406.iteye.com/blog/1606972