Schedulerx2.0 distributed computing principles & best practices

1 Introduction

Schedulerx2.0 clients provide distributed execution, various types of tasks, such as log unified framework, users can rely on the jar package schedulerx-worker, schedulerx2.0 programming model provided by simple few lines of code can achieve a high reliable operation and maintenance can be distributed execution engine.

This article focuses on introduction of distributed execution engine based on the principles and best practices schedulerx2.0, I believe that after reading this article, you can write efficient distributed work, maybe several times the speed can improve :)

2. scalable execution engine

Worker overall architecture Yarn reference architecture is divided into TaskMaster, Container, Processor three layers:

image

  • TaskMaster: similar yarn of AppMaster, support scalable, distributed execution framework, management of the entire life cycle of jobInstance, container resource management, as well as failover capabilities. The default implementation StandaloneTaskMaster (single execution), BroadcastTaskMaster (broadcast execution), MapTaskMaster (parallel computation, memory, grid, grid computing), MapReduceTaskMaster (parallel computation, memory, grid, grid computing).
  • Container: a container framework of the implementation of business logic, support thread / process / docker / actor and so on.
  • Processor: business logic framework, different processor represent different types of tasks.

To MapTaskMaster for example, about the principle as shown below:

image

3. The model of distributed programming model Map

Schedulerx2.0 offers a variety of distributed programming model, this article introduces Map model (after the article will introduce MapReduce model for more business scenarios), a simple few lines of code can be distributed to multiple massive data distributed on machines running batches, very simple to use.

For different batch run scenario, map the model work also provides parallel computing, memory grid, grid computing three modes:

  • Parallel Computing: the following subtasks 300, there is a list of sub-tasks.
  • 内存网格:子任务5W以下,无子任务列表,速度快。
  • 网格计算:子任务100W以下,无子任务列表。

4. 并行计算原理

因为并行任务具有子任务列表:

image
如上图,子任务列表可以看到每个子任务的状态、机器,还有重跑、查看日志等操作。

因为并行计算要做到子任务级别的可视化,并且worker挂了、重启还能支持手动重跑,就需要把task持久化到server端:

image
如上图所示:

  1. server触发jobInstance到某个worker,选中为master。
  2. MapTaskMaster选择某个worker执行root任务,当执行map方法时,会回调MapTaskMaster。
  3. MapTaskMaster收到map方法,会把task持久化到server端。
  4. 同时,MapTaskMaster还有个pull线程,不停拉取INIT状态的task,并派发给其他worker执行。

5. 网格计算原理

网格计算要支持百万级别的task,如果所有任务都往server回写,server肯定扛不住,所以网格计算的存储实际上是分布式在用户自己的机器上的:
image
如上图所示:

  1. server触发jobInstance到某个worker,选中为master。
  2. MapTaskMaster选择某个worker执行root任务,当执行map方法时,会回调MapTaskMaster。
  3. MapTaskMaster收到map方法,会把task持久化到本地h2数据库。
  4. 同时,MapTaskMaster还有个pull线程,不停拉取INIT状态的task,并派发给其他worker执行。

6. 最佳实践

6.1 需求

举个例子:

  1. 读取A表中status=0的数据。
  2. 处理这些数据,插入B表。
  3. 把A表中处理过的数据的修改status=1。
  4. 数据量有4亿+,希望缩短时间。

6.2 反面案例

我们先看下如下代码是否有问题?

public class ScanSingleTableProcessor extends MapJobProcessor {
    private static int pageSize = 1000;

    @Override
    public ProcessResult process(JobContext context) {
        String taskName = context.getTaskName();
        Object task = context.getTask();

        if (WorkerConstants.MAP_TASK_ROOT_NAME.equals(taskName)) {
            int recordCount = queryRecordCount();
            int pageAmount = recordCount / pageSize;//计算分页数量
            for(int i = 0 ; i < pageAmount ; i ++) {
                List<Record> recordList = queryRecord(i);//根据分页查询一页数据
                map(recordList, "record记录");//把子任务分发出去并行处理
            }
            return new ProcessResult(true);//true表示执行成功,false表示失败
        } else if ("record记录".equals(taskName)) {
            //TODO
            return new ProcessResult(true);
        }
        return new ProcessResult(false);
    }
}

如上面的代码所示,在root任务中,会把数据库所有记录读取出来,每一行就是一个Record,然后分发出去,分布式到不同的worker上去执行。逻辑是没有问题的,但是实际上性能非常的差。结合网格计算原理,我们把上面的代码绘制成下面这幅图:
image
如上图所示,root任务一开始会全量的读取A表的数据,然后会全量的存到h2中,pull线程还会全量的从h2读取一次所有的task,还会分发给所有客户端。所以实际上对A表中的数据:

  • 全量读2次
  • The full amount of write-once
  • The full amount of transmission time

This efficiency is very low.

6.3 positive cases

Here are a positive example of code:

public class ScanSingleTableJobProcessor extends MapJobProcessor {
    private static final int pageSize = 100;

    static class PageTask {
        private int startId;
        private int endId;
        public PageTask(int startId, int endId) {
             this.startId = startId;
             this.endId = endId;
        }
        public int getStartId() {
              return startId;
        }
        public int getEndId() {
              return endId;
        }
    }

    @Override
    public ProcessResult process(JobContext context) {
        String taskName = context.getTaskName();
        Object task = context.getTask();
        if (taskName.equals(WorkerConstants.MAP_TASK_ROOT_NAME)) {
            System.out.println("start root task");
            Pair<Integer, Integer> idPair = queryMinAndMaxId();
            int minId = idPair.getFirst();
            int maxId = idPair.getSecond();
            List<PageTask> taskList = Lists.newArrayList();
            int step = (int) ((maxId - minId) / pageSize); //计算分页数量
            for (int i = minId; i < maxId; i+=step) {
                taskList.add(new PageTask(i, (i+step > maxId ? maxId : i+step)));
            }
            return map(taskList, "Level1Dispatch");
        } else if (taskName.equals("Level1Dispatch")) {
            PageTask record = (PageTask)task;
            long startId = record.getStartId();
            long endId = record.getEndId();
            //TODO
            return new ProcessResult(true);
        }
        return new ProcessResult(true);
    }

    @Override
    public void postProcess(JobContext context) {
        //TODO
        System.out.println("all tasks is finished.");
    }

    private Pair<Integer, Integer> queryMinAndMaxId() {
        //TODO select min(id),max(id) from xxx
        return null;
    }
}

As shown in the above code,

  • Each task is not an entire row of recording record, but PageTask, the inside two fields, startId and endId.
  • root task, no amount of reading A whole table, but read minId and maxId entire table, and then construct PageTask paging. For example task1 represents PageTask [1,1000], task2 represents PageTask [1001,2000]. Each processing task A different data table.
  • In a next task, if the get is PageTask, then according to Table A data processing section id.

The above code and grid principle, this picture results in the following:
image
As shown above,

  • A table full amount need only be read once.
  • The number of sub-tasks thousand less than the negative examples, thousands of times.
  • subtasks body is very small, if recod in a large field, but also thousands, ten thousand times less.

To sum up, on the table A few visits several times, thousands of times less pressure on h2 memory, speed of execution can not only much faster, but also to ensure that will not put their local h2 database hang out.

Guess you like

Origin yq.aliyun.com/articles/704121