Analysis of ElasticJob-Lite 3.x scheduled tasks

Zhengcaiyun technical team.png

Yiyun.png

Introduction

ElasticJob-Lite is a distributed scheduling solution for the Internet ecosystem and massive tasks.

Different from traditional timed tasks, its original intention is to face high concurrency and complex business . Even if the business volume is large and there are many servers, it can do a good job in task scheduling and make the best use of server resources.

First acquaintance with the architecture diagram

As can be seen from the figure below, ElasticJob-Lite is dependent ZooKeeper. If the master server hangs, a new master server will be automatically elected through the ZooKeeper election mechanism, so ElasticJob-Lite has good scalability and availability.

This architecture diagram is the overall picture of the operation of ElasticJob-Lite. It looks complicated. We decompose this diagram into several small pieces and simplify steps to make it easier for everyone to understand. First is registration:

register

When the scheduled task APP starts, it sends information to ZooKeeper, and establishes one on ZooKeeper 包含App信息的实例节点, which is registration.

There is another 分片concept to understand here, the number of shards is a configuration. Simply put, it is the total number of times the scheduled task needs to be executed at the same time. Assuming that both App1 and App2 in the following figure are registered, the scheduled task runs once an hour:

  • 分片数是1: According to the definition, the scheduled task needs to be executed once at the same time, so at 3:00 pm, App1 can run the scheduled task, but App2 has no scheduled task to run.
  • 分片数是2: App1 can run scheduled tasks, and App2 can also run scheduled tasks.
  • 分片数是3: App1 runs the scheduled task twice, and App2 runs the scheduled task once.

Generally speaking, App注册数=分片数+1. If you want to have only one app running for a scheduled task and the number of shards is set to 1, it is recommended to register 2 apps. That is, only App1 is running the scheduled task, and App2 is regarded as the backup of App1. When App1 disconnects unexpectedly, App2 runs the scheduled task seamlessly. So what 分片does 注册it have to do with it? Why do we talk about sharding when registering?

Because when the App is registered, the nodes generated by ZooKeeper contain fragmentation information. Whenever a new App is registered, the shard information in all nodes will be updated. For example, if you want two machines to run scheduled tasks, then the number of shards is set to 2:

  1. App1 starts, registers and generates nodes, and writes 0 and 1 in the node information, that is, App1 runs shard 0 and shard 1, and implements the logic of running twice.
  2. App2 starts, registers and spawns nodes, and the two nodes reassign shards.
  3. App1 node writes 0, which means running shard 0; App2 node writes 1, runs shard 1, and implements the logic of running each time.
  4. More nodes are registered, and shards are continuously redistributed according to the policy.

So far, we have known registration, let's look at the second piece of content, the core of ElasticJob-Lite:

monitor

The above registration refers to the registration of App/service, which is relatively coarse-grained, and the monitoring here refers to 由注册启动,针对生成节点的内容变化作出相应改变的行为. The sharding mentioned just now also has a corresponding sharding monitor. It monitors the change in the number of apps/services and starts to shard each app/service according to the established strategy.

监听是针对 ZooKeeper 节点的监听. There are many ways to change the ZooKeeper node, such as service online, such as modifying the configuration in the console, etc., can trigger monitoring. But either way, the ZooKeeper node information will be modified, so just register with ZooKeeper to monitor its node changes.

Briefly describe the execution process on the right in the ElasticJob-Lite dashed box. It is the process from the online to the failure of the service, and it is the detailed process of logging in in the previous section:

  1. Online registration: write service information to ZooKeeper nodes
  2. Schedule trigger: The timed task is put into the schedule of the quartz framework, which controls the start and stop of the timed task
  3. zk选主:调用 ZooKeeper 选主节点,主节点负责判断是否需要分片
  4. 是否分片:可以看到图中分了两块,左边为主节点判断需要分片,走分片流程,右边是主节点判断不需要分片,继续执行
  5. 执行定时任务
  6. 失效:服务下线,退出,zk 中的对应节点失效

当App/服务失效后,主节点触发分片任务,zk上还存活的某个节点,会接过下线服务的接力棒,跑失效节点应该执行的定时任务。若节点原先是备份节点(没有得到分片),那么此时得到一个分片;已有一个分片,那么它跑两个分片。就是注册时讲的分片内容。

ElasticJob-Lite虚线框中左边,listener包含了许多监听器,他们在注册时被添加,下面列举部分监听器:

  • 选举监听 - ElectionListener
  • 分片监听 - ShardingListener
  • 服务器失效监听 - FailoverListener
  • 下线监听 - ShutdownListener
  • cron修改监听 - RescheduleListener,等等。

可以看出,ElasticJob 启动会把所有信息都注册到 ZooKeeper 节点上保存,节点的一举一动,代表着各服务发生的变化,都会触发在线的App/服务发生改变。

日志

服务在运行的时候会生成日志,日志有两个存放的地方:一个是日志文件、一个是数据库。对应下图,左边是事件保存到数据库(按需)、右边是执行日志放入日志文件。 我在理解这部分的时候,是有疑惑的,整理了相关资料后,可以看到我对这张图做了修改:

  1. 增加 Events 块图中 DB 说明
  2. 删除 ELK 虚线块

解释一下,Events 其实也是日志的一种,如果不写 DB,我会疑惑 Events 为什么和 Logs 并列,难以理解它的真实含义。用默认图标表示DB,这种描述风格跟全文完全不搭,其他都是在方块内写的清清楚楚的,为什么到这里就画风突变。不同的表述风格加上隐藏的含义让人很难注意到这个图标,无法准确理解分块的含义。

另一个叉掉 ELK,是我觉得ELK读取日志和 ElasticJob 完全没有关系,ElasticJob 运行完全不依赖 ELK,因此没有必要在架构图里显示出来。

控制台

控制台展示的信息可以分为两块,一块是读取 ZooKeeper 节点信息展示,一块是读取数据库中的 Events 展示。对应 Status 和 Events Statistics 指向的绿色箭头。

Operation 表示,通过 Console 控制台修改 ZooKeeper 中的节点信息,后台服务监听到修改后,作出相应的改变。

下图的修改作业界面就是 Operation 的一种,属于配置信息修改。是 Console 读取 ZooKeeper 信息之后展示出来的,修改页面信息将同步修改 ZooKeeper 节点信息,同时,后台定时任务也会作出相应改变。

代码分析

代码结构图

ElasticJob-Lite 分为前端代码和后台代码。前端负责响应页面点击事件,之后发送请求至后台,后台负责各种功能的实现。

而 ElaticJob-Lite 后台代码主要逻辑是在 ElasticJob-Lite 中的 ecosystem 包和 core 包中。ecosystem 有 ElasticJob Executor 执行器。而本次要介绍的启动的实现和监听的实现,都在 core 包中实现。如果要深入了解,可以从 core 包的 setUpFacade 入手。

启动DEMO

启动前需要确认环境:

  • Java8 或 更高版本
  • Maven 3.5.0 或 更高版本
  • ZooKeeper 3.6.0 或 更高版本

下面是DEMO代码

import org.apache.commons.dbcp2.BasicDataSource;
import org.apache.shardingsphere.  ElasticJob .api.JobConfiguration;
import org.apache.shardingsphere.  ElasticJob .lite.api.bootstrap.impl.ScheduleJobBootstrap;
import org.apache.shardingsphere.  ElasticJob .reg.base.CoordinatorRegistryCenter;
import org.apache.shardingsphere.  ElasticJob .reg. ZooKeeper . ZooKeeper Configuration;
import org.apache.shardingsphere.  ElasticJob .reg. ZooKeeper . ZooKeeper RegistryCenter;
import org.apache.shardingsphere.  ElasticJob .tracing.api.TracingConfiguration;

public class MyJobDemo {

    public static void main(String[] args) {
        // 入口
        new ScheduleJobBootstrap(createRegistryCenter(),new MyJob(),createJobConfiguration()).schedule();
    }
    
    public static CoordinatorRegistryCenter createRegistryCenter(){
        CoordinatorRegistryCenter registryCenter = new  ZooKeeper RegistryCenter(
                new  ZooKeeper Configuration("localhost:2181","my-job"));
        // 初始化
        registryCenter.init();
        return registryCenter;
    }

    public static JobConfiguration createJobConfiguration() {
        // 配置基本信息
        JobConfiguration jobConfiguration = JobConfiguration.newBuilder("Myjob",3)
                .cron("0 0/30 * * * ?").shardingItemParameters("0=Beijing,1=Shanghai,2=Guangzhou")
                .jobErrorHandlerType("LOG").jobShardingStrategyType("ROUND_ROBIN").overwrite(true).failover(true)
                .monitorExecution(true).build();
        // 配置数据库日志 - 选配
        TracingConfiguration tc = new TracingConfiguration("RDB",getDataSource());
        jobConfiguration.getExtraConfigurations().add(tc);
        return jobConfiguration;
    }

    public static BasicDataSource getDataSource(){
        BasicDataSource dataSource = new BasicDataSource();
        dataSource.setDriverClassName(com.mysql.cj.jdbc.Driver.class.getName());
        dataSource.setUrl("jdbc:mysql://localhost:3306/batch_log?useUnicode=true&useSSL=false&characterEncoding=UTF-8&allowMultiQueries=true&serverTimezone=GMT%2B8");
        dataSource.setUsername("root");
        dataSource.setPassword(你的密码);
        return dataSource;
    }
}

Maven依赖如下:


<properties>
    <java.version>1.8</java.version>
    <latest.release.version>3.0.0</latest.release.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-dbcp2</artifactId>
        <version>2.7.0</version>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>8.0.13</version>
    </dependency>
    <dependency>
        <groupId>org.apache.shardingsphere.  ElasticJob </groupId>
        <artifactId>  ElasticJob-Lite-core</artifactId>
        <version>${latest.release.version}</version>
    </dependency>
</dependencies>

ZooKeeper 中的节点

ElasticJob-Lite 的运行时注册和监听 ZooKeeper 节点的一个过程,那么,节点都有哪些呢?下面是一些主要节点的介绍:

  • /config:配置节点,保存分片数等信息。

  • /instances:实例节点,一个服务注册一个。/{jobInstanceId} 是服务自己生成的,如 /192.128.0.1@-@123 节点。节点保存实例 ID 和服务的 IP。

  • /leader:选举相关的节点,有选举、主持分片、主持失效转移的作用。比如需要分片了,就新增 /sharding/necessary 节点,分片完毕后,会删除此节点。/failover 失效转移同理。

  • /servers:服务节点,按服务 IP 生成。可以控制此 IP 下定时任务的开启和关闭。节点内写入 enabled,则此 IP 下的定时任务失效。

  • /sharding:分片节点,服务注册时会生成。比如分片数为 3,就生成 0、1、2 三个节点,分片数 4 就生成 4 个节点。假设启动 1 个服务,配置4个分片,则三个节点的 /instance 下 instanceId 都为同一个;2 个服务 4 个分片,则 /0、/1 节点下同一个实例 ID,/2、/3 节点下同一个实例 ID。即每次注册后,谁执行多少个分片就已经定好了。

  • job_instance_id:实例 ID,启动时生成,由IP+@-@+随机数组成,作用是唯一标识一个实例。

下图是我按层级列出来的 ZooKeeper 部分节点,/xxx 代表节点,黑色方块表示节点中的内容: /namespace/jobname:服务器启动时注册同一个 jobname 表示他们执行同一个定时任务。

监听代码

通过主要的启动代码,可以让我们更加了解 ElasticJob ,下面是注册启动信息

public void registerStartUpInfo(final boolean enabled) {
    // 启动监听器
    listenerManager.startAllListeners();
    // 使用curator选举Leader:/leader/election/latch
    leaderService.electLeader();
    // 服务持久化:初始化-创建父节点 /servers,已存在-事务性更新
    serverService.persistOnline(enabled);
    // 实例持久化:/instances
    instanceService.persistOnline();
    if (!reconcileService.isRunning()) {
        // 调解冲突
        reconcileService.startAsync();
    }
}

我们看startAllListeners一共启动了多少监听器:

public void startAllListeners() {
    // 选举Listener
    electionListenerManager.start();
    // 分片Listener
    shardingListenerManager.start();
    // 故障转移Listener
    failoverListenerManager.start();
    // 监控执行
    monitorExecutionListenerManager.start();
    // 下线监听
    shutdownListenerManager.start();
    // 页面立即执行功能监听
    triggerListenerManager.start();
    // cron表达式修改监听
    rescheduleListenerManager.start();
    // 处理启动和竞争节点信息
    guaranteeListenerManager.start();
    // 处理连接信息:连接、重连
    jobNodeStorage.addConnectionStateListener(regCenterConnectionStateListener);
}
分片实现

分片启动注册了两个监听器:

  1. 分片数变更
  2. 服务变更
@Override
public void start() {
    // 分片数变更
    addDataListener(new ShardingTotalCountChangedJobListener());
    addDataListener(new ListenServersChangedJobListener());
}

分片数变更和服务变更触发的实现基本一致,我们来看一下分片数变更的实现逻辑:

class ShardingTotalCountChangedJobListener implements DataChangedEventListener {
    
    @Override
    public void onChange(final DataChangedEvent event) {
        // 修改/config下配置 && 分片数不为0
        if (configNode.isConfigPath(event.getKey()) && 0 != JobRegistry.getInstance().getCurrentShardingTotalCount(jobName)) {
            int newShardingTotalCount = YamlEngine.unmarshal(event.getValue(), JobConfigurationPOJO.class).toJobConfiguration().getShardingTotalCount();
            // 新分片数!=老分片数
            if (newShardingTotalCount != JobRegistry.getInstance().getCurrentShardingTotalCount(jobName)) {
                // 判断是否主节点,主节点才能执行
                if (!leaderService.isLeaderUntilBlock()) {
                    return;
                }
                // 在/leader/sharding节点下新增/necessary节点
                jobNodeStorage.createJobNodeIfNeeded(ShardingNode.NECESSARY);
                // 本地缓存分片数
                JobRegistry.getInstance().setCurrentShardingTotalCount(jobName, newShardingTotalCount);
            }
        }
    }
}

通过上述代码我们知道,分片不是实时的,而是通过新增节点来异步触发分片:

  1. 判断 ZooKeeper 触发的事件是否为配置修改、判断分片数 >0
  2. 判断是否主节点,只有主节点才能新增节点并执行分片
  3. 非主节点,返回;主节点,执行 4
  4. 在/leader/sharding 节点下新增 /necessary 节点,并在本地缓存分片数

监听器监听到修改后,只负责新增necessary节点。定时任务启动前,会检查是否存在 /necessary 节点。如果存在,先分片、后执行。下面是分片的实现:

public void shardingIfNecessary() {
    List<JobInstance> availableJobInstances = instanceService.getAvailableJobInstances();
    // 需要分片
    if (!isNeedSharding() || availableJobInstances.isEmpty()) {
        return;
    }
    // 主节点执行
    if (!leaderService.isLeaderUntilBlock()) {
        blockUntilShardingCompleted();
        return;
    }
    // 如果存在正在执行作业中的分片,等待所有作业执行完毕
    waitingOtherShardingItemCompleted();
    // 获取配置
    JobConfiguration jobConfig = configService.load(false);
    int shardingTotalCount = jobConfig.getShardingTotalCount();
    log.debug("Job '{}' sharding begin.", jobName);
    // 分片状态设置为执行中
    jobNodeStorage.fillEphemeralJobNode(ShardingNode.PROCESSING, "");
    // reset sharding节点
    resetShardingInfo(shardingTotalCount);
    /**
     * 分片策略:AverageAllocationJobShardingStrategy default 平均分配
     *         OdevitySortByNameJobShardingStrategy         根据作业名的哈希值奇偶数决定IP升降序
     *         RoundRobinByNameJobShardingStrategy          根据作业名的哈希值对服务器列表进行轮转分片
     */
    // add 根据负载粘性分配
    JobShardingStrategy jobShardingStrategy = JobShardingStrategyFactory.getStrategy(jobConfig.getJobShardingStrategyType());
    /**
     * 1.Check  "/"
     * 2.Create "/Myjob/Sharding/0/instances"
     *          "/Myjob/Sharding/2/instances"
     *          "/Myjob/Sharding/1/instances"
     * 3.Delete "/Myjob/leader/sharding/necessary"
     * 4.Delete "/Myjob/leader/sharding/processing"
     */
    jobNodeStorage.executeInTransaction(getShardingResultTransactionOperations(jobShardingStrategy.sharding(availableJobInstances, jobName, shardingTotalCount)));
    log.debug("Job '{}' sharding complete.", jobName);
}
  1. 主节点检查 /leader/.../necessary 节点,判断是否需要分片:不需要,返回;需要,执行下一步骤
  2. 等待其他定时任务执行完毕,再进行分片
  3. 获取配置,修改zk分片状态为执行中
  4. 封装对节点的操作。先删除原有 /sharding 下的所有节点,并新增新分片数节点。如分片数 1 修改为 2,则删除 /sharding/0,新增 /sharding/0 和 /sharding/1。
  5. 主节点选择分配策略,计算出哪些分片被哪些服务实例执行,将服务实例ID写入分片节点中,比如,在 /sharding/0/instance 节点写入服务实例的 ID。有三种策 - 略平均分配、IP 排序、RoundRobin,默认为平均分配。
  6. 事务执行:
    • 删除原分片节点
    • 新增分片节点
    • 每个分片写入对应的执行服务的实例的 ID
    • 删除 /Myjob/leader/sharding/necessary和/Myjob/leader/sharding/processing 节点

至此,整个分片流程执行完毕。

通过分析了分片功能的代码实现,我们初步了解了 ElasticJob-Lite 3.x 的监听、变更的流程。还有很多的功能,实现逻辑和分片大同小异,理解了分片,就能大致了解其实现。

ElasticJob-Lite-UI控制台

shardingsphere-ElasticJob 从 3.0.0-alpha 版本开始,将 Console 管理界面单独拆分出来,现有网上的很多教程都是基于 Console 未拆分出来的版本,下面教程是基于最新版 3.0.1 搭建的 ElasticJob 管理界面。

ElasticJob-Lite-UI 3.x 控制台部署

新的 Console 部署起来并不是非常简便,建议参考上述连接后进行部署。

推荐阅读

雪花算法详解

基于流量域的数据全链路治理

一次接口响应时间过长的性能分析及排查过程

MySQL 之 innodb 自增锁原理实现

招贤纳士

Zhengcaiyun technical team (Zero), a team full of passion, creativity and execution, Base is located in the picturesque Hangzhou. The team currently has more than 300 R&D partners, including "veteran" soldiers from Ali, Huawei, and NetEase, as well as newcomers from Zhejiang University, University of Science and Technology of China, Hangdian University and other schools. In addition to daily business development, the team also conducts technical exploration and practice in the fields of cloud native, blockchain, artificial intelligence, low-code platform, middleware, big data, material system, engineering platform, performance experience, visualization, etc. And landed a series of internal technology products, and continued to explore new boundaries of technology. In addition, the team has also devoted themselves to community building. Currently, they are contributors to many excellent open source communities such as google flutter, scikit-learn, Apache Dubbo, Apache Rocketmq, Apache Pulsar, CNCF Dapr, Apache DolphinScheduler, alibaba Seata, etc. If you want to change, you have been tossed with things, and you want to start tossing things; if you want to change, you have been told that you need more ideas, but you can't break the game; if you want to change, you have the ability to do that, but you don't need you; if you If you want to change what you want to do, you need a team to support it, but there is no place for you to lead people; if you want to change, you have a good understanding, but there is always that layer of blurry paper... If you believe in the power of belief, I believe that ordinary people can achieve extraordinary things, and I believe that they can meet a better self. If you want to participate in the process of taking off with the business, and personally promote the growth of a technical team with in-depth business understanding, a sound technical system, technology creating value, and spillover influence, I think we should talk. Anytime, waiting for you to write something, send it to  [email protected]

WeChat public account

The article is released simultaneously, the public account of the technical team of Zhengcaiyun, welcome to pay attention

政采云技术团队.png

Guess you like

Origin juejin.im/post/7114816346969341960