高可用Hdfs&HBase配置实践

首先介绍版本背景，hdfs为2.7.1，hbase为1.3.0，其它版本的配置可能存在变化。

HDFS相关配置：

dfs.datanode.synconclose 设为true，当为false时，系统重启或断电时有可能数据丢失，默认值是false。

当写操作完成之后，缓存中的block不会立即被写入磁盘，如果要同步将缓存的block写入磁盘，用户需要将“hdfs-site.xml”中的dfs.datanode.synconclose设置为true。更改此设置后，对性能可能存在影响。

dfs.datanode.sync.behind.writes=FALSE 如果是true，写之后，DN将指示操作系统把队列中的数据全部立即写磁盘。和常用的OS策略不同，它们可能在触发写磁盘之前等待30s

dfs.namenode.avoid.write.stale.datanode —— default: true
dfs.namenode.avoid.read.stale.datanode —— default: true
dfs.namenode.stale.datanode.interval —— default: 30 seconds

默认是true，超过30s未收到heartbeat的datanode，namenode会将之判为最低优先级的读写

关于stale的理解可以参看下面的链接：

https://community.hortonworks.com/questions/2474/how-to-identify-stale-datanode.html

dfs.qjournal.write-txns.timeout.ms 默认是20000ms

Namenode写Journalnode的超时时间，默认是20s，当发生超时后，Namenode会触发ExitUtil类的terminate方法，导致进程的System.exit()，超时原因可能是网络也有可能是namenode fullGC。

dfs.qjournal.start-segment.timeout.ms 默认是20000ms

EditLog会被切割为很多段，每一段称为一个segment，Namenode发起新写入editlog的RPC调用，会使用startLogSegment方法，上述参数表示发起新segment的超时时间。

dfs.client.read.shortcircuit = true

dfs.client.read.shortcircuit.buffer.size = 131072

hdfs短路读取，客户端读hdfs时，datanode会根据blockID从本地磁盘读数据并通过TCP流发送给client端，但是，如果Client与Block位于同一节点，那么client端直接读取本地Block文件即可获取数据，无需通过Datanode的TCP连接发送，这就是短路读取（short-circuit）;

dfs.datanode.failed.volumes.tolerated = <N>

坏盘容忍，可以容忍的坏盘数量；

HDFS-5776 Heged Read

第一次读取超时时，请求第二个DN，降低整体延迟；

dfs.client.hedged.read.threadpool.size ＝ 50

dfs.client.hedged.read.threshold.millis ＝ 100

dfs.datanode.fsdataset.volume.choosing.policy = AvailableSpaceVolumeChoosingPolicy

默认值是RoundRobinVolumeChoosingPolicy

datanode数据副本存放磁盘选择策略，第一种RoundRobinVolumeChoosingPolicy是磁盘目录轮询方式，第二种方式是选择选择可用空间足够多的磁盘方式存储。

Hortonworks 建议使用默认的RoundRobin policy。

dfs.datanode.max.xcievers（新版已改名为dfs.datanode.max.transfer.threads）

对于datanode来说，就如同linux上的文件句柄的限制，当datanode上面的连接数超过设置值时，datanode会拒绝连接；

默认值是4096；

ipc.server.tcpnodelay

默认值 false。

在 Hadoop server 是否启动 Nagle’s 算法。设 true 会 disable 这个演算法，关掉会减少延迟，但是会增加小数据包的传输；

ipc.server.tcpnodelay和ipc.server.tcpkeepalive设置tcp连接处理方式（Nagle’s algorithm 和 keepalive）；

hbase相关配置：

hbase.hstore.compactionThreshold 表示开始compaction的最低文件数，默认是2，可考虑增大；

hbase.hstore.compaction.kv.max 默认10，可以考虑增大到20，flush或者compact时一个batch处理的kv数；

hbase.hstore.flusher.count 用于flush的线程数，默认为2，可以调整为6；

hbase.quota.enabled 默认是false，设置为true；

hbase.region.replica.replication.enabled 和 hbase.replication 可以设为true，表示支持region replica，默认是false；

hbase.regionserver.storefile.refresh.period 如果上面设置为true，secondary region周期从primary region获取region最新file列表，默认是0，表示不启用，该设置表示secondary region扫描的周期，ms单位；

hbase.online.schema.update.enable

hbase regionserver配置在线热调整，默认是true；

hbase多个wal支持，可以按如下设置：

<property>
<name>hbase.wal.provider</name>
<value>multiwal</value>
</property>
<property>
<name>hbase.wal.regiongrouping.strategy</name>
<value>bounded</value>
</property>
<property>
<name>hbase.wal.regiongrouping.numgroups</name>
<value>2</value>
</property>

原始的hbase中，一个regionserver只有一个wal文件，所有region的walEntry都写到这个wal文件中，在HBase-5699之后，一个regionserver可以配置多个wal文件，这样可以提高写WAL时的吞吐，进而降低数据写延时，其中配置hbase.wal.regiongrouping.strategy决定了每个region写入wal时的分组策略，默认是bounded，表示每个regiongroup写入固定数量个wal；

注意的是hbase.regionserver.maxlogs，决定了一个regionserver中wal文件的最大数量，默认是32，在上述配置下，如果仍旧设置保持32，等价于不使用multiwal时的64；

hbase.hregion.memstore.mslab.enabled

启动memstore的MemStore-Local Allocation Buffer，默认为true，建议使用默认值，可以有效管理memstore的内存，减小write heavy场景下的内存碎片；

zookeeper.session.timeout

zookeeper超时时间，默认是90000ms，建议将该值增大，可以设置为120000；

高可用Hdfs&amp;HBase配置实践

猜你喜欢

高可用Hdfs&HBase配置实践