Springboot - Integrating Elastic Job for Task Scheduling

Table of contents

1. Task scheduling

2.Elastic Job

3. springboot integrated Elastic Job


1. Task scheduling

What is task scheduling?

Task scheduling refers to the process in which the system performs tasks at a specified time in order to automatically complete specific tasks. The purpose is to allow the system to complete tasks automatically and accurately so as to liberate human resources.

For example, the e-commerce platform starts the rush-buying activity at the designated time instead of manpower; the system automatically sends text messages to remind customers at the designated time; the system sorts out orders, financial data and other related data at a certain time every month.

So what is distributed task scheduling?

Distributed is a system architecture, which refers to different sub-services under the same system, which are distributed on different machines, and the services interact through the network to complete the business processing of the entire system.

Distributed scheduling is the task scheduling process under the distributed system architecture, where a service often has multiple instances.

As shown in the figure above, the same distributed architecture system is composed of different microservices, and each microservice has multiple instances. In this case, task scheduling is jointly completed.

The purpose of distributed task scheduling is to perform task scheduling in parallel to improve task scheduling and processing capabilities; effectively allocate resources to achieve task processing capabilities that can be flexibly scaled with resource allocation; and ensure high availability of tasks. Multiple instances work together to complete task scheduling. Even if one instance hangs up, other instances can continue to complete it; the leader is elected by zookeeper to avoid repeated execution of tasks.

2.Elastic Job

 Elastic Job is developed based on Quartz and Curator. It is a distributed scheduling solution for Internet ecology and massive tasks. It consists of two independent sub-projects, ElasticJob-Lite and ElasticJob-Cloud.

It creates a distributed scheduling solution suitable for Internet scenarios through the functions of flexible scheduling, resource management and control, and job governance, and provides a diversified job ecology through an open architecture design. Each of its products uses a unified job API, and developers only need to develop it once and deploy it at will.

Among them, the structure of ElasticJob-Lite is as follows:

APP is an application program, which contains execution business logic and Elastic-job-lite components.

Elastic-job-lite: Positioned as a lightweight decentralized solution, it is responsible for task scheduling and generating logs and task scheduling records. Decentralization means that there is no scheduling center, and all job nodes are equal, and distributed coordination is done through zookeeper.

Registery: The registration center, using zookeeper as the registration center component of Elastic-job-lite, ensures that tasks are not repeatedly executed by performing task instance elections.

Console: An operation and maintenance platform, where you can view log files and other related information.

To put it simply, we write the task execution logic in the instance, and tell the elastic job framework at the specified time. The elastic job framework performs task scheduling through the configured job fragmentation information; the elastic job framework finds the corresponding task through zookeeper Namespace and other information, through the form of electing a leader, select the instance to execute, and finally let the instance execute automatic tasks.

We can implement distributed task scheduling through Elastic Job, effectively allocate resources, improve task execution efficiency, and avoid task duplication.

3. springboot integrated Elastic Job

The requirements for using the elastic job environment are as follows:

Import dependencies in the springboot project:

    <dependencies>
        <!--springboot相关依赖-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <!--elastic-job-lite-->
        <dependency>
            <groupId>com.dangdang</groupId>
            <artifactId>elastic-job-lite-spring</artifactId>
            <version>2.1.5</version>
        </dependency>
        <!--zookeeper curator-->
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-recipes</artifactId>
            <version>2.12.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-framework</artifactId>
            <version>2.12.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-client</artifactId>
            <version>2.12.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>3.4.5</version>
        </dependency>
        <!--mysql-druid-->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.37</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid-spring-boot-starter</artifactId>
            <version>1.1.10</version>
        </dependency>
        <!-- mybatis-plus启动器-->
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.5.1</version>
        </dependency>
    </dependencies>

(1) Configure the registration center:

import com.dangdang.ddframe.job.reg.base.CoordinatorRegistryCenter;
import com.dangdang.ddframe.job.reg.zookeeper.ZookeeperConfiguration;
import com.dangdang.ddframe.job.reg.zookeeper.ZookeeperRegistryCenter;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class JobRegistryCenterConfig {
    //zookeeper端口
    private static final int ZOOKEEPER_PORT = 2181;
    private static final String JOB_NAMESPACE = "elastic-job-example-java";

    @Bean(initMethod = "init")
    public CoordinatorRegistryCenter setupRegistryCenter(){
        //zk配置
        ZookeeperConfiguration zookeeper = new ZookeeperConfiguration("localhost:" + ZOOKEEPER_PORT, JOB_NAMESPACE);
        zookeeper.setSessionTimeoutMilliseconds(100);
        //创建注册中心
        return new ZookeeperRegistryCenter(zookeeper);
    }
}

(2) Then, we prepare some data:

@TableName(value ="FileCustom")
@Data
@AllArgsConstructor
@NoArgsConstructor
public class FileCustom implements Serializable {
    @TableId
    private Integer id;
    private String name;
    private String type;
    private String content;
    private boolean backup;

    @TableField(exist = false)
    private static final long serialVersionUID = 1L;
}

 

Business class (implemented based on mybatis plus): Mainly realize business, query all unbacked up files according to file type

@Service
public class FileCustomServiceImpl extends ServiceImpl<FileCustomMapper, FileCustom>
    implements FileCustomService {
    @Resource
    private FileCustomMapper filecustomMapper;

    @Override
    public List<FileCustom> getFileList(String fileType, int size) {
        //分页参数
        Page<FileCustom> page = Page.of(1,size);
        QueryWrapper<FileCustom> queryWrapper = new QueryWrapper<FileCustom>()
                .eq("type", fileType)
                .eq("backup",false);
        filecustomMapper.selectPage(page,queryWrapper);
        return page.getRecords();
    }
}

About why it is queried according to the file type:

The system is deployed on two servers in a distributed manner, and each instance has an instance to perform automatic file backup tasks. Then the job can be divided into two pieces, each instance on the server executes two pieces, and each piece of job processes text and image type files respectively.

Therefore, reasonable fragmentation of tasks (generally, the fragmentation item is set to be greater than the number of servers, preferably a multiple of the number of servers) can maximize the throughput of executing jobs.

(3) Write elastic job task execution logic:

SimpleJob form: simple implementation without any encapsulation.

import com.dangdang.ddframe.job.api.ShardingContext;
import com.dangdang.ddframe.job.api.simple.SimpleJob;
import com.seven.scheduler.entities.FileCustom;
import com.seven.scheduler.service.FileCustomService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.util.List;

/**
 * 文件备份任务
 */
@Slf4j
@Component
public class FileBackupJob implements SimpleJob {

    @Resource
    private FileCustomService filecustomService;

    //每次任务执行备份文件数量
    private final int FETCH_SIZE = 1;

    @Override
    public void execute(ShardingContext shardingContext) {
        //获取分片参数
        //此处分片参数为文件类型,根据类型对任务进行分片
        //"0=txt,1=jpg"
        String jobParameter = shardingContext.getShardingParameter();
        log.info("作业编号:"+shardingContext.getShardingItem()+",处理"+jobParameter+"类型数据");
        //获取文件
        List<FileCustom> list = filecustomService.getFileList(jobParameter,FETCH_SIZE);
        //备份文件
        backupFiles(list);
    }

    //文件备份
    public void backupFiles(List<FileCustom> list){
        list.forEach(fileCustom -> {
            fileCustom.setBackup(true);
            filecustomService.updateById(fileCustom);
            log.info(fileCustom.getName()+"进行备份");
        });
    }
}

DataflowJob form, used to process data flow.

Whether stream processing can be configured through DataflowJobConfiguration: only when the return value of the fetchData method is null or the collection length is empty for stream processing data, the job will stop fetching, otherwise the job will continue to run; non-stream processing data will only be in the Execute the fetchData method and processData method once during each job execution, and then complete the job.

import com.dangdang.ddframe.job.api.ShardingContext;
import com.dangdang.ddframe.job.api.dataflow.DataflowJob;
import com.seven.scheduler.entities.FileCustom;
import com.seven.scheduler.service.FileCustomService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.util.List;

@Slf4j
@Component
public class FileBackupDataFlowJob implements DataflowJob<FileCustom> {

    @Resource
    private FileCustomService fileCustomService;

    //抓取数据
    @Override
    public List<FileCustom> fetchData(ShardingContext shardingContext) {
        //获取分片参数
        //此处分片参数为文件类型,根据类型对任务进行分片
        //"0=txt,1=jpg,2=png,3=video"
        String jobParameter = shardingContext.getShardingParameter();
        log.info("作业编号:"+shardingContext.getShardingItem()+",处理"+jobParameter+"类型数据");
        //获取文件
        return fileCustomService.getFileList(jobParameter,1);
    }

    //处理数据
    @Override
    public void processData(ShardingContext shardingContext, List<FileCustom> data) {
        backupFiles(data);
    }
    //文件备份
    public void backupFiles(List<FileCustom> list){
        list.forEach(fileCustom -> {
            fileCustom.setBackup(true);
            fileCustomService.updateById(fileCustom);
            log.info(fileCustom.getName()+"进行备份");
        });
    }
}

Choose one type from the above two methods to write business code (both acquire data and process data).

(4) Define task scheduling class, configure job fragmentation and other information:

import com.dangdang.ddframe.job.api.ElasticJob;
import com.dangdang.ddframe.job.config.JobCoreConfiguration;
import com.dangdang.ddframe.job.config.simple.SimpleJobConfiguration;
import com.dangdang.ddframe.job.lite.config.LiteJobConfiguration;
import com.dangdang.ddframe.job.lite.spring.api.SpringJobScheduler;
import com.dangdang.ddframe.job.reg.base.CoordinatorRegistryCenter;
import com.seven.scheduler.job.FileBackupDataFlowJob;
import com.seven.scheduler.job.FileBackupJob;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import javax.annotation.Resource;

@Configuration
@Slf4j
public class ElasticJobConfig {

    @Resource
    private FileBackupJob fileBackupJob;
    @Resource
    private FileBackupDataFlowJob fileBackupDataFlowJob;
    @Resource
    private CoordinatorRegistryCenter registryCenter;
    //数据库数据源
    @Resource
    private DataSource dataSource;

    @Bean(initMethod = "init")
    public SpringJobScheduler initSimpleElasticJob(){
        //生成日志表
        JobEventRdbConfiguration jobEventRdbConfiguration = new JobEventRdbConfiguration(dataSource);
        return new SpringJobScheduler(fileBackupJob,registryCenter,
                createJobConfiguration(fileBackupJob.getClass(),"0 20 17 * * ?",
                        2,"0=txt,1=jpg"),jobEventRdbConfiguration);
    }


    private LiteJobConfiguration createJobConfiguration(final Class<? extends ElasticJob> jobClass,
                                                        final String cron, final int shardingTotalCount,
                                                        final String shardingItemParameters){
        JobCoreConfiguration.Builder builder = JobCoreConfiguration.newBuilder(jobClass.getName(), cron, shardingTotalCount);
        if (!StringUtils.isEmpty(shardingItemParameters)){
            builder.shardingItemParameters(shardingItemParameters);
        }
        JobCoreConfiguration jobCoreConfiguration = builder.build();
        //simpleJob形式
        SimpleJobConfiguration simpleJobConfiguration = new SimpleJobConfiguration(jobCoreConfiguration,jobClass.getCanonicalName());
        return LiteJobConfiguration.newBuilder(simpleJobConfiguration).overwrite(true).build();
        
        //DataflowJob形式,最后的参数true即为开启流式抓取
//        DataflowJobConfiguration dataflowJobConfiguration =
//                new DataflowJobConfiguration(jobCoreConfiguration, jobClass.getCanonicalName(), true);
//        return LiteJobConfiguration.newBuilder(dataflowJobConfiguration).overwrite(true).build();
    }
}

In the above code, the task is divided into two pieces, job No. 0 is responsible for backing up txt type data, and job No. 1 is responsible for backing up jpg type data; this task is executed at 17:20 every day (refer to cron expression for details)

(If you choose the DataflowJob format, replace fileBackupJob with fileBackupDataFlowJob, and replace simpleJobConfiguration with DataflowJobConfiguration.)

(5) We can execute it. Decibel starts two processes to process the data. The execution effect is as follows:

It can be seen that the task was automatically executed at 17:20.

Job 0 processed the txt file, and job 1 processed the jpg file; the two processes processed their respective files, realizing the reasonable allocation of resources and avoiding repeated processing of data.

Guess you like

Origin blog.csdn.net/tang_seven/article/details/130562955