Foreword
National Day is approaching, the project has ended, has recently been summarized, received a XXL-JOB
research task. In the "official time" fishing in troubled waters, I was very happy, without further ado, directly to the question.
I believe we XXL-JOB
all understand that it is not too much paper source introduced, is focused look at the source of thought during several knowledge points , not necessarily to please the great God who criticized the correction.
XXL-JOB Profile
XXL-JOB
Is a lightweight distributed task scheduling platform, its core design goal is to develop rapid, simple to learn, lightweight, easy to expand. Open source and now online companies access to the product line, out of the box.XXL-JOB
Divided into dispatch centers, actuators, data center, dispatch center is responsible for task management and scheduling, actuators management, log management, is responsible for the implementation and execution of the task execution results callback.
Task Scheduler - to achieve "class time wheel"
Time round
From the time the wheel Netty
is HashedWheelTimer
, it is an annular structure, clock analogy, there are many clock face bucket
, each bucket
can be stored on a plurality of tasks using a List
all of the tasks due to save time, while a pointer as time goes by a rotatable frame by frame, and performs the corresponding bucket
all due tasks. Tasks modulo decision which should be put bucket
. And HashMap
a similar principle, newTask
correspondence put
, use List
to resolve conflicts Hash.
FIG above example, assume that a bucket
is 1 second, the rotation period of the pointer is represented by a 8S, assumed that the current pointer to 0, this case requires a scheduling tasks After 3s, should obviously be added to (0 + 3 = 3 ) squares, the pointer walk 3s times to be carried out; if the tasks to be performed in the 10s, and so should finish a zero pointer 2 squares and then execute, and therefore should be placed in 2, at the same time round(1)
save the task. Due Tasks performed only when the check round
is 0, bucket
the other tasks round
minus 1.
Of course, there optimized "hierarchical time round," the realization, please refer https://cnkirito.moe/timer/.
XXL-JOB the "wheel of time"
-
XXL-JOB scheduler from manner
Quartz
into a self-developed robin fashion, much like the wheel of time, may be understood as 60bucket
and eachbucket
of one second, but not theround
concept. -
Specific look FIG.
- XXL-JOB responsible for task scheduling has two threads, respectively,
ringThread
andscheduleThread
its role as follows.
1, scheduleThread: task information read, read-ahead future 5s task is about to trigger, put time round. 2, ringThread: current
bucket
andbucket
fetch and execute tasks.
- The following combination of source code look, why is' class time round ", the key code attached a comment, please pay attention to watch.
// 环状结构
private volatile static Map<Integer, List<Integer>> ringData = new ConcurrentHashMap<>();
// 任务下次启动时间(单位为秒) % 60
int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);
// 任务放进时间轮
private void pushTimeRing(int ringSecond, int jobId){
// push async ring
List<Integer> ringItemData = ringData.get(ringSecond);
if (ringItemData == null) {
ringItemData = new ArrayList<Integer>();
ringData.put(ringSecond, ringItemData);
}
ringItemData.add(jobId);
}
复制代码
// 同时取两个时间刻度的任务
List<Integer> ringItemData = new ArrayList<>();
int nowSecond = Calendar.getInstance().get(Calendar.SECOND);
// 避免处理耗时太长,跨过刻度,向前校验一个刻度;
for (int i = 0; i < 2; i++) {
List<Integer> tmpData = ringData.remove( (nowSecond+60-i)%60 );
if (tmpData != null) {
ringItemData.addAll(tmpData);
}
}
// 运行
for (int jobId: ringItemData) {
JobTriggerPoolHelper.trigger(jobId, TriggerTypeEnum.CRON, -1, null, null);
}
复制代码
Hash routing consistency in the Hash Algorithm
- We all know that,
XXL-JOB
in carrying out its mandate, the specific tasks in which the actuator is running according to the routing policies to determine which strategy is to have a consistent strategy Hash (source in ExecutorRouteConsistentHash.java), naturally thought of consistency Hash algorithm . - Consistency Hash algorithm is a distributed system to solve the problem of load balancing can be used when Hash algorithm allows a fixed part of the request falls on the same server, each server so that the fixed handle part of the request (and maintain information on these requests), from to load balancing effect.
- Common remainder hash (hash (such as user id)% Number of server machine) algorithm scalability poor, offline when new or server machines when mapping between user id and the server will be a lot of failures. Consistency is the use of hash hash ring be improved.
- Hash algorithm consistency in practice, when there will be less of a server node when the festival said consistency hash tilt of the problem, a solution is to pay more machines, but there is a cost plus machine, then add virtual nodes .
- Please refer to the specific principles https://www.jianshu.com/p/e968c081f563.
- Hash below shows the ring with virtual nodes, wherein ip1-1 ip1 is the virtual node, ip2-1 is the virtual node ip2, ip3 IP3-1 is a virtual node.
Visible , the key is consistency Hash Algorithm Hash algorithm to ensure the virtual node and Hash result uniformity, and uniformity can be understood as reduce conflict Hash , Hash knowledge conflict, please refer to the HashMap, Redis dictionary see [Hash]. . . .
- XXL-JOB Consistency Hash Hash function as follows.
// jobId转换为md5
// 不直接用hashCode() 是因为扩大hash取值范围,减少冲突
byte[] digest = md5.digest();
// 32位hashCode
long hashCode = ((long) (digest[3] & 0xFF) << 24)
| ((long) (digest[2] & 0xFF) << 16)
| ((long) (digest[1] & 0xFF) << 8)
| (digest[0] & 0xFF);
long truncateHashCode = hashCode & 0xffffffffL;
复制代码
- Hash functions see on the map, it reminds me
HashMap
of Hash Functions
f(key) = hash(key) & (table.length - 1)
// 使用>>> 16的原因,hashCode()的高位和低位都对f(key)有了一定影响力,使得分布更加均匀,散列冲突的几率就小了。
hash(key) = (h = key.hashCode()) ^ (h >>> 16)
复制代码
- Similarly, the level of jobId of md5 encoded bits affect the results of Hash, Hash makes the probability of collisions is reduced.
Achieve fragmentation of tasks - maintenance thread context
-
XXL-JOB ren of a distributed task execution, in fact, the author is the focus of this call, many regular tasks are stand-alone perform daily development, follow-up data for a large task best to have a distributed solution.
-
Routing policy fragmentation tasks, source code, the authors propose a fragmentation broadcasting concept, just started a little lost in the mind, read the source code gradually clear up.
-
Must have seen the source also experienced such an episode, routing policies ye did not realize? As shown below.
public enum ExecutorRouteStrategyEnum {
FIRST(I18nUtil.getString("jobconf_route_first"), new ExecutorRouteFirst()),
LAST(I18nUtil.getString("jobconf_route_last"), new ExecutorRouteLast()),
ROUND(I18nUtil.getString("jobconf_route_round"), new ExecutorRouteRound()),
RANDOM(I18nUtil.getString("jobconf_route_random"), new ExecutorRouteRandom()),
CONSISTENT_HASH(I18nUtil.getString("jobconf_route_consistenthash"), new ExecutorRouteConsistentHash()),
LEAST_FREQUENTLY_USED(I18nUtil.getString("jobconf_route_lfu"), new ExecutorRouteLFU()),
LEAST_RECENTLY_USED(I18nUtil.getString("jobconf_route_lru"), new ExecutorRouteLRU()),
FAILOVER(I18nUtil.getString("jobconf_route_failover"), new ExecutorRouteFailover()),
BUSYOVER(I18nUtil.getString("jobconf_route_busyover"), new ExecutorRouteBusyover()),
// 说好的实现呢???竟然是null
SHARDING_BROADCAST(I18nUtil.getString("jobconf_route_shard"), null);
复制代码
- And then continue to pursue been concluded, I slowly come to be, what is the first slice task execution parameters passed? See
XxlJobTrigger.trigger
function section of code.
...
// 如果是分片路由,走的是这段逻辑
if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST == ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null)
&& group.getRegistryList() != null && !group.getRegistryList().isEmpty()
&& shardingParam == null) {
for (int i = 0; i < group.getRegistryList().size(); i++) {
// 最后两个参数,i是当前机器在执行器集群当中的index,group.getRegistryList().size()为执行器总数
processTrigger(group, jobInfo, finalFailRetryCount, triggerType, i, group.getRegistryList().size());
}
}
...
复制代码
- After the parameters RPC passed since the inquiry to the actuator, the actuator in charge of task execution
JobThread.run
, we see the following code.
// 分片广播的参数比set进了ShardingUtil
ShardingUtil.setShardingVo(new ShardingUtil.ShardingVO(triggerParam.getBroadcastIndex(), triggerParam.getBroadcastTotal()));
...
// 将执行参数传递给jobHandler执行
handler.execute(triggerParamTmp.getExecutorParams())
复制代码
- Then look
ShardingUtil
, only to find the mystery, look at the code.
public class ShardingUtil {
// 线程上下文
private static InheritableThreadLocal<ShardingVO> contextHolder = new InheritableThreadLocal<ShardingVO>();
// 分片参数对象
public static class ShardingVO {
private int index; // sharding index
private int total; // sharding total
// 次数省略 get/set
}
// 参数对象注入上下文
public static void setShardingVo(ShardingVO shardingVo){
contextHolder.set(shardingVo);
}
// 从上下文中取出参数对象
public static ShardingVO getShardingVo(){
return contextHolder.get();
}
}
复制代码
- Obviously, the task is responsible slice
ShardingJobHandler
taken out of the slice's thread context parameter, here to the code -
@JobHandler(value="shardingJobHandler")
@Service
public class ShardingJobHandler extends IJobHandler {
@Override
public ReturnT<String> execute(String param) throws Exception {
// 分片参数
ShardingUtil.ShardingVO shardingVO = ShardingUtil.getShardingVo();
XxlJobLogger.log("分片参数:当前分片序号 = {}, 总分片数 = {}", shardingVO.getIndex(), shardingVO.getTotal());
// 业务逻辑
for (int i = 0; i < shardingVO.getTotal(); i++) {
if (i == shardingVO.getIndex()) {
XxlJobLogger.log("第 {} 片, 命中分片开始处理", i);
} else {
XxlJobLogger.log("第 {} 片, 忽略", i);
}
}
return SUCCESS;
}
}
复制代码
- It follows, a distributed implementation is a slice parameter
index
andtotal
do simple terms, this identification is given actuator, differentiated according to the identifier of the task data or logical, distributed operation can be realized . - Digression: As to why the injection mode fragment parameters of external, not directly
execute
pass?
1, it may be because only fragments task was to use these two parameters 2, IJobHandler only parameter of type String
Thoughts after reading the source code
- 1, look through the source code, XXL-JOB indeed meet the design goals developed rapidly, learn simple, lightweight, easy to expand .
- 2, as for self-study RPC no specific considerations, specific access should consider the company's RPC framework.
- 3, the author gives the
Quartz
scheduling problem, I have to continue in-depth understanding. - 4, many compatible framework for downtime, fault, overtime and other abnormal conditions is worth learning.
- 5, Rolling logs and system logs need to continue to achieve understanding.