HBase-客户端-重试机制

背景

HBase是一个分布式数据库

当遇到宕机，客户端如何识别数据的迁移？

当遇到数据region 分裂，如何识别到最新的数据region

当遇到网络抖动等导致请求失败，客户端如何处理失败的请求？

首先，我们知道hbase客户端有重试次数参数设置

hbase.client.retries.number在很多类中被调用

其中在RpcRetryingCallerFactory中有被使用

reties=hbase.client.retries.number

另外从另一个角度入手（一个方向不好入手的时候，试着从另外角度入手）

如果看过HBase的度写链路也可以知道RPC caller的实现为RpcRetryingCaller

看类名也可以知道这是一个支持重试的caller

让我们具体看一下RpcRetryingCaller中如何应用reties的

扫描二维码关注公众号，回复： 8843923 查看本文章

可以看到在callWithRetries方法中被用到(从方法名可以看到这个是专门做重试的方法)

callWithRetries代码说明

见上面代码的

expectedSleep = callable.sleep(pause,tries);

callable.sleep方法的实现：

sleep包含两部分，第一部分getPauseTime

数组下标为重试次数，

1.通过第N次重试找到对应数组元素数字，再* pause

数组的作用：随着重试次数的累加，间隔时间应该越长，类似：一个处理原件，生产组合零件的机器出了一些故障之后，工人可能想着重新往里面填入原件，但是随着机器故障时间拉长，应该减少重复的失败操作，甚至停止操作(设置重试次数)，而不是一直不断高频的重试

2.再加上一个随机的数值，防止所有操作同时开始重试

此方法用到的数组

/**

* Retrying we multiply hbase.client.pause setting by what we have in this array until we

* run out of array items. Retries beyond this use the last number in the array. So, for

* example, if hbase.client.pause is 1 second, and maximum retries count

* hbase.client.retries.number is 10, we will retry at the following intervals:

* 1, 2, 3, 5, 10, 20, 40, 100, 100, 100.

* With 100ms, a back-off of 200 means 20s

*/

public static final int RETRY_BACKOFF[] = {1, 2, 3, 5, 10, 20, 40, 100, 100, 100, 100, 200, 200};

gloria_y

发布了52 篇原创文章 · 获赞 4 · 访问量 5万+

私信关注