JStorm: task scheduling

The previous article JStorm: Concept and programming model introduces the basic concepts JStorm knowledge and programming models aspects This part describes their understanding of JStorm task scheduling aspects, mainly from three aspects introduced:

Scheduling role
Scheduling
Custom schedule

Scheduling role

Task role structure
FIG JStorm is on a topology structure corresponding task execution, which is the process worker, a thread corresponding to the Executor, task corresponding spout or bolt assembly.

Worker

Worker is the task of the container, the same worker would only associated with the implementation of a topology task. A topology may be executed in one or more worker (work processes) which each worker performs a portion of the overall topology. For example, for the degree of parallelism is the topology 300, if we use 50 to perform the work process, each worker process handles six tasks of them. Storm will try to work evenly distributed to all of the worker.

Executor

Executor is a thread of execution in the worker, in the same class executor, the same bolt or all class of task, or all task with a spout like to note that an executor can only run one task at the same time, create when multiple task set in a executor, the main consideration in the early Storm in the late-threaded extension (to be verified), but the number can be changed at the time of rebalance task in JStorm in, so do not be greater than the number of task executor.

Task

Task executor is true task, corresponds to a bolt or spout assembly's creation topology. Each spout and bolt will be treated as a lot of task execution across the cluster. You can call setSpout and setBolt TopologyBuilder class to set the degree of parallelism (that is, the number of task).

Scheduling

The default scheduling algorithm

The default scheduling algorithms follow the following principles:

Scheduling algorithm for the dimension worker, try to evenly onto each Supervisor;
Worker in the unit, the number of worker task substantially confirm correspondence relationship (Note that other topologies previously occupied by no longer using a worker involved in this operation);
Establish task-worker relations priority order is: try to avoid the same task in the same work and supervisor, and try to ensure that task equally distributed on the worker and supervisor baseline, try to ensure that there is a direct flow of information transmission task at the same worker.
Scheduling process ongoing scheduling action will not affect the scheduled actions that have occurred

Scheduling Example

When a topology is created as the configuration code, and a schematic diagram of the runtime.

//创建topology配置代码
Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes
topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2);
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
               .setNumTasks(4)
               .shuffleGrouping("blue-spout");
topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
               .shuffleGrouping("green-bolt");
StormSubmitter.submitTopology("mytopology", conf,
                              topologyBuilder.createTopology());

Large Box JStorm: task scheduling rient / strip% 7CimageView2 / 2 / w / 1240 "alt =" showing the results of scheduling "/>
reference to the above codes, and task scheduling algorithm, the topology is set to 2 worker, blue Spout concurrency is set to 2, the same task as the default concurrent 2; 2 green Bolt concurrent, but it is provided as task 4, there are two so each executor Task, yellow concurrent Bolt 6, the same task as the default concurrent 6 .
figure is consistent with two worker can be considered when doing JStorm assigning tasks to weigh, as far as possible uniform distribution, does not mean that all cases are true.

Distribution process

The figure is an example of a storm, JStorm identical.
JStorm task distribution process:

The client topology to submit nimbus, and begin to implement;
Nimbus establish a local directory for this topology, calculating according topology configuration task, task assignment, establishing a corresponding relationship storage node assignments and task supervisor node woker machine in ZooKeeper;
Creating taskbeats node on the zookeeper to monitor the heartbeat of the task; start topology.
Get tasks assigned to each ZooKeeper Supervisor, a plurality of start woker for each task generated woker; establishing a connection between the topology information according to the initialization task.

Custom schedule

JStorm support what custom scheduling settings:

Set the default memory size of each worker

1	ConfigExtension.setMemSizePerWorker(Map conf, long memSize)

The settings on each worker's cgroup, cpu weight

1	ConfigExtension.setCpuSlotNumPerWorker(Map conf, int slotNum)

Set whether to use the old distribution

1	ConfigExtension.setUseOldAssignment(Map conf, boolean useOld)

设置强制某个component的task 运行在不同的节点上

1	ConfigExtension.setTaskOnDifferentNode(Map componentConf, boolean isIsolate)

注意，这个配置componentConf是component的配置，需要执行addConfigurations 加入到spout或bolt的configuration当中

自定义worker分配

WorkerAssignment worker = new WorkerAssignment();
worker.addComponent(String compenentName, Integer num);//在这个worker上增加一个task
worker.setHostName(String hostName);//强制这个worker在某台机器上
worker.setJvm(String jvm);//设置这个worker的jvm参数
worker.setMem(long mem); //设置这个worker的内存大小
worker.setCpu(int slotNum); //设置cpu的权重大小
ConfigExtension.setUserDefineAssignment(Map conf, List<WorkerAssignment> userDefines)

Note: The parameters for each worker does not need to be all set, worker attributes in the legal premise, even if only to set some parameters will still take effect

Forced topology running on some of the supervisor
in practical applications, often some machines deployed local service (such as local DB), in order to improve performance, so that the topology of all mandatory task to run on these machines
1

conf.put(Config.ISOLATION_SCHEDULER_MACHINES, List<String> isolationHosts)

conf is the configuration topology

Reproduced, please indicate the source