Storm的调度系统Scheduler概述

一.Scheduler概述

Scheduler是Storm的调度器，它负责为Topology分配当前集群中可用的资源。 Storm定义了IScheduler接口，用户可以通过实现该接口来定义自己的Scheduler。 Storm提供了几种Scheduler，分别是EvenScheduler、 DefaultScheduler和IsolationScheduler，Pluggable Schedule，MultitenantScheduler，ResourceAwareScheduler 下面简要介绍一下它们。

1.EvenScheduler：
会将系统中的可用资源均匀地分配给当前需要任务分配的多个Topology。

2.DefaultScheduler：
跟EvenScheduler基本一致，唯一的区别在于它会在为Topology分配任务之前先释放掉其他Topology不再需要的资源，然后调用EventScheduler方法为Topology均匀分配资源。

3.IsolationScheduler：
它提供了一种机制，使得用户可以单独为某些Topology指定它们需要的机器资源（机器数目）。用户需要在Storm配置项中指定这些信息（ topology-name及其所需的机器数目）， IsolationScheduler会优先对这些Topology分配任务，保证分配给某个Topology的机器只能运行这个特定的Topology, 相当于这些Topology的运行环境是相互独立的。待这些指定的Topology分配完成之后，再调用DefaultScheduler, 利用系统中剩余的资源为剩余的Topology进行任务分配。

4.Pluggable Schedule：
可插拔式的任务分配器,编写自己的task分配算法，实现自己的调度器来替代默认的调度器去分配executors给workers。在storm.yaml文件里指定storm.scheduler，自定义的调度器要实现IScheduler接口。

5.MultitenantScheduler：
这种调度模式会为每个topology发布者构造一个自己专属的隔离资源池，之后会通过遍历topology集，通过为资源池分配topology关联来分配节点。

6. ResourceAwareScheduler：
资源感知调度器可以在每个用户的基础上分配资源。每个用户可以保证一定数量的资源来运行他或她的 topology，并且资源感知调度器将尽可能满足这些保证。当 Storm 群集具有额外的免费资源时，资源感知调度器将能够以公平的方式为用户分配额外的资源。

二.Scheduler接口

如果用户想要自定义Scheduler，需要实现ISheduler接口，该接口是Storm定义的为集群当前所有Topology分配任务的接口，它的定义如下...

public interface IScheduler {
    
    void prepare(Map conf);
    
    /**
     * Set assignments for the topologies which needs scheduling. The new assignments is available 
     * through <code>cluster.getAssignments()</code>
     *
     *@param topologies all the topologies in the cluster, some of them need schedule. Topologies object here 
     *       only contain static information about topologies. Information like assignments, slots are all in
     *       the <code>cluster</code>object.
     *@param cluster the cluster these topologies are running in. <code>cluster</code> contains everything user
     *       need to develop a new scheduling logic. e.g. supervisors information, available slots, current 
     *       assignments for all the topologies etc. User can set the new assignment for topologies using
     *       <code>cluster.setAssignmentById</code>
     */
    void schedule(Topologies topologies, Cluster cluster);
}

这个定义中，主要涉及两个方法:

1.prepare方法：它接收当前Nimbus的Storm配置作为参数，以进行一些初始化

2.scheduler方法：它是真正进行任务分配的方法。在Nimbus进行任务分配的时候会调用该方法。它的参数包括topologies和cluster。前者含有了当前集群中所有的Topology信息，后者则代表当前集群，其中包含用户自定义调度逻辑时所需的所有资源，包括Supervisor信息、当前可用的所有slot, 以及任务分配情况等。

三.Storm调度的相关术语

在看Storm的Scheduler代码么之前，得要弄明白几个概念，这样可以更好的理解后面的调度过程。

1.slot：这代表一个Supervisor节点上的一个单位资源。每个slot对应一个port，一个slot只能被一个Worker占用。

2.Worker，Executor，Task：1个Worker包含1个或多个Executor执行器，每个执行器包含多个Task。

3.Executor的表现形式为[1-1],[2-2]，中括号内的数字代表该Executor中的起始Task id到末尾Task id，1个Worker就相当于在外面加个大括号{[1-1],[2-2]}

4.Component。Storm中的每个组件就是指一类Spout或1个类型的Bolt，这里指的是名称类型，不包含个数。

四.自定义调度器示例

Storm自定义实现直接分配调度器来自：Storm自定义调度器实现--DirectScheduler

//DirectScheduler把划分单位缩小到组件级别,1个Spout和1个Bolt可以指定到某个节点上运行,
//如果没有指定,还是按照系统自带的调度器进行调度.这个配置在Topology提交的Conf配置中可配.
public class DirectScheduler implements IScheduler{

@Override
public void prepare(Map conf) {

}

@Override
public void schedule(Topologies topologies, Cluster cluster) {
    System.out.println("DirectScheduler: begin scheduling");
    // Gets the topology which we want to schedule
    Collection<TopologyDetails> topologyDetailes;
    TopologyDetails topology;
    //作业是否要指定分配的标识
    String assignedFlag;
    Map map;
    Iterator<String> iterator = null;

    topologyDetailes = topologies.getTopologies();
    for(TopologyDetails td: topologyDetailes){
        map = td.getConf();
        assignedFlag = (String)map.get("assigned_flag");

        //如何找到的拓扑逻辑的分配标为1则代表是要分配的,否则走系统的调度
        if(assignedFlag != null && assignedFlag.equals("1")){
            System.out.println("finding topology named " + td.getName());
            topologyAssign(cluster, td, map);
        }else {
            System.out.println("topology assigned is null");
        }
    }

    //其余的任务由系统自带的调度器执行
    new EvenScheduler().schedule(topologies, cluster);
}


/**
 * 拓扑逻辑的调度
 * @param cluster
 * 集群
 * @param topology
 * 具体要调度的拓扑逻辑
 * @param map
 * map配置项
 */
private void topologyAssign(Cluster cluster, TopologyDetails topology, Map map){
    Set<String> keys;
    PersistentArrayMap designMap;
    Iterator<String> iterator;

    iterator = null;
    // make sure the special topology is submitted,
    if (topology != null) {
        designMap = (PersistentArrayMap)map.get("design_map");
        if(designMap != null){
            System.out.println("design map size is " + designMap.size());
            keys = designMap.keySet();
            iterator = keys.iterator();

            System.out.println("keys size is " + keys.size());
        }

        if(designMap == null || designMap.size() == 0){
            System.out.println("design map is null");
        }

        boolean needsScheduling = cluster.needsScheduling(topology);

        if (!needsScheduling) {
            System.out.println("Our special topology does not need scheduling.");
        } else {
            System.out.println("Our special topology needs scheduling.");
            // find out all the needs-scheduling components of this topology
            Map<String, List<ExecutorDetails>> componentToExecutors = cluster.getNeedsSchedulingComponentToExecutors(topology);

            System.out.println("needs scheduling(component->executor): " + componentToExecutors);
            System.out.println("needs scheduling(executor->components): " + cluster.getNeedsSchedulingExecutorToComponents(topology));
            SchedulerAssignment currentAssignment = cluster.getAssignmentById(topology.getId());
            if (currentAssignment != null) {
                System.out.println("current assignments: " + currentAssignment.getExecutorToSlot());
            } else {
                System.out.println("current assignments: {}");
            }

            String componentName;
            String nodeName;
            if(designMap != null && iterator != null){
                while (iterator.hasNext()){
                    componentName = iterator.next();
                    nodeName = (String)designMap.get(componentName);

                    System.out.println("现在进行调度 组件名称->节点名称:" + componentName + "->" + nodeName);
                    componentAssign(cluster, topology, componentToExecutors, componentName, nodeName);
                }
            }
        }
    }
}

/**
 * 组件调度
 * @param cluster
 * 集群的信息
 * @param topology
 * 待调度的拓扑细节信息
 * @param totalExecutors
 * 组件的执行器
 * @param componentName
 * 组件的名称
 * @param supervisorName
 * 节点的名称
 */
private void componentAssign(Cluster cluster, TopologyDetails topology, Map<String, List<ExecutorDetails>> totalExecutors, String componentName, String supervisorName){
    if (!totalExecutors.containsKey(componentName)) {
        System.out.println("Our special-spout does not need scheduling.");
    } else {
        System.out.println("Our special-spout needs scheduling.");
        List<ExecutorDetails> executors = totalExecutors.get(componentName);

        // find out the our "special-supervisor" from the supervisor metadata
        Collection<SupervisorDetails> supervisors = cluster.getSupervisors().values();
        SupervisorDetails specialSupervisor = null;
        for (SupervisorDetails supervisor : supervisors) {
            Map meta = (Map) supervisor.getSchedulerMeta();

            if(meta != null && meta.get("name") != null){
                System.out.println("supervisor name:" + meta.get("name"));

                if (meta.get("name").equals(supervisorName)) {
                    System.out.println("Supervisor finding");
                    specialSupervisor = supervisor;
                    break;
                }
            }else {
                System.out.println("Supervisor meta null");
            }

        }

        // found the special supervisor
        if (specialSupervisor != null) {
            System.out.println("Found the special-supervisor");
            List<WorkerSlot> availableSlots = cluster.getAvailableSlots(specialSupervisor);

            // 如果目标节点上已经没有空闲的slot,则进行强制释放
            if (availableSlots.isEmpty() && !executors.isEmpty()) {
                for (Integer port : cluster.getUsedPorts(specialSupervisor)) {
                    cluster.freeSlot(new WorkerSlot(specialSupervisor.getId(), port));
                }
            }

            // 重新获取可用的slot
            availableSlots = cluster.getAvailableSlots(specialSupervisor);

            // 选取节点上第一个slot,进行分配
            cluster.assign(availableSlots.get(0), topology.getId(), executors);
            System.out.println("We assigned executors:" + executors + " to slot: [" + availableSlots.get(0).getNodeId() + ", " + availableSlots.get(0).getPort() + "]");
        } else {
            System.out.println("There is no supervisor find!!!");
        }
    }
}

}

使用方法：

1.打包此项目,将jar包拷贝到STORM_HOME/lib目录下,在nimbus节点上的Storm包

2.在nimbus节点的storm.yaml配置中,进行配置:storm.scheduler: "storm.DirectScheduler"

3.然后是在supervisor的节点中进行名称的配置,配置项如下:

supervisor.scheduler.meta: 
name: "your-supervisor-name"

4.然后重启nimbus,supervisor节点即可,集群配置只要1次配置即可.

转载于:https://www.jianshu.com/p/664a82bf699e