1. Overall description

The entire code analysis is based on storm-2.0.
Write picture description here
The whole process can be divided into 5 steps:
1. The user executes the storm jar command to submit the task to Nimbus
2. The timing thread of Nimbus checks whether there is a task that needs to be run
3. When there is a task, send a message to Supervisor
4. Start a logWriter Process
5. LogWriter starts the actual Worker process

2. Task submission

User tasks are submitted to Nimbus. This can be divided into the following steps: client processing, Nimbus receiving topology, timing task processing topology

2.1 Client processing

The client program will bring several commonly used nouns into it: TopologyBuilder, spout, bolt

2.1.1 Topo Creation

The following program is an example of wordcout:

        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("spout", new RandomSentenceSpout(), 5);
        builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
        builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

This is the process of building a topology. There are several parameters in the topology that need attention:

    private final Map<String, IRichBolt> _bolts = new HashMap<>();
    private final Map<String, IRichSpout> _spouts = new HashMap<>();
    private final Map<String, ComponentCommon> commons = new HashMap<>();
    private final Map<String, Set<String>> _componentToSharedMemory = new HashMap<>();
    private final Map<String, SharedMemory> _sharedMemory = new HashMap<>();

Our bolt and spout information is stored in the corresponding Map, and there is a Commons Map, which requires special attention. All spouts and bolts are saved once in this map. This can be seen from setSpout or setBolt

public SpoutDeclarer setSpout(String id, IRichSpout spout, Number parallelism_hint) throws IllegalArgumentException {
        validateUnusedId(id);
        initCommon(id, spout, parallelism_hint); //这里就是设置到commons中
        _spouts.put(id, spout);
        return new SpoutGetter(id);
    }

2.2 Nimbus receiving topology

There is a function submitTopologyWithOpts() in Nimbus.java, which is the code that actually handles task submission. The code process is as follows (only the main parts are listed):

Nimbus::submitTopologyWithOpts( ... )
{
    //1. 参数的检查,以及topologyId的组装等，

    //2. 上传jar包到nimbus上面
    LOG.info("uploadedJar {}", uploadedJarLocation);
    setupStormCode(conf, topoId, uploadedJarLocation, totalConfToSave, topology);

    //3. 在zookeeper上面设置一些topo需要的目录
    state.setupHeatbeats(topoId, topoConf);
    state.setupErrors(topoId, topoConf);

    //4. 调用startTopology()函数
    startTopology(topoName, topoId, status, topologyOwner, topologyPrincipal); 
}

Let's take a look at the startTopology function:

private void startTopology( ... )
{
    //1. 解析numExecutors的信息
    StormTopology topology = StormCommon.systemTopology(topoConf, readStormTopology(topoId, topoCache));
    Map<String, Integer> numExecutors = new HashMap<>();
    for (Entry<String, Object> entry : StormCommon.allComponents(topology).entrySet()) {
        numExecutors.put(entry.getKey(), StormCommon.numStartExecutors(entry.getValue()));
    }
    //在这里，需要注意的，除了有用户创建的bolt与spout对象外，还有两个特殊的bolt，它们的名称是：__acker与__system，特别是这个__acker后面在Ack消息的时候会使用到
    //2. 参数信息的设置到，如当前时间，状态，提交者等 

    //3. 调用激活函数
    state.activateStorm(topoId, base, topoConf);
}

But when we look at the activateStorm function, we will find that it has no contact with the supervisor, and its code is as follows:

    public void activateStorm(String stormId, StormBase stormBase, Map<String, Object> topoConf) {
        String path = ClusterUtils.stormPath(stormId);
        stateStorage.mkdirs(ClusterUtils.STORMS_SUBTREE, defaultAcls); //在zookeeper创建相应目录(/storm/mk)
        stateStorage.set_data(path, Utils.serialize(stormBase), ClusterUtils.mkTopoReadOnlyAcls(topoConf));
        this.assignmentsBackend.keepStormId(stormBase.get_name(), stormId);
    }

View zookeeper information, you can see:

[zk: localhost:2181(CONNECTED) 2] ls /storm/assignments
[start-topology]

At this point, the function will return successful information to the client, but at this time the topology is not running on the supervisor node. To sum up, we will find that it does four things in these two functions:
1. Various parameter verification, parameter concatenation
2. Upload the jar package on the client to the nimbus node
3. In zookeeper Create the corresponding directory above, in /storm/assignments, etc.
3. Save the topo that has completed the parameters into the stormClusterState object.

2.3 processing topology

Before the task corresponding to the topology is officially executed, there is still a job to select the corresponding work (that is, which supervisor node to start the corresponding work process on), and these tasks are mainly completed by the function Nimbus::mkAssignment(). And the call of this function is called by StormTimer. Its call stack is as follows:

    org.apache.storm.daemon.nimbus.Nimbus.mkAssignments() 2,078 <- 
    org.apache.storm.daemon.nimbus.Nimbus.mkAssignments() 2,003 <- 
    org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$29() 2,701 <- 
    org.apache.storm.StormTimer$1.run() 111 <- 
    org.apache.storm.StormTimer$StormTimerTask.run() 227

Let's take a look at the use of mkAssignment()

private void mkAssignments(String scratchTopoId) throws Exception {

    // 1. 注意这里的stormClusterState，与我们之前在startTopology()是同一个
    IStormClusterState state = stormClusterState;  
    ... ... 

    //2. 获取部署的节点信息
    newSchedulerAssignments = computeNewSchedulerAssignments(existingAssignments, topologies, bases, scratchTopoId);
    ... ... 

    //3. 开始部署
    notifySupervisorsAssignments(newAssignments, assignmentsDistributer, totalAssignmentsChangedNodes,
                                     basicSupervisorDetailsMap);

    ... ... 

｝

Here is a special mention of the notifySupervisorsAssignments() function. When the Supervisor receives the message, it is called through the RPC in this function.

2.4 Supervisor node receives messages

The startup of the Supervisor process will start a process: SynchronizeAssignments, which we can see from the Supervisor log

2018-05-30 08:02:06.825 o.a.s.u.NimbusClient Thread-4 [INFO] Found leader nimbus : node129:6627
2018-05-30 08:02:06.826 o.a.s.d.s.t.SynchronizeAssignments Thread-4 [DEBUG] Sync an assignments from master, will start to sync with assignments: SupervisorAssignments(storm_assignment:{})

The code corresponding to this line of log is SynchronizeAssignments::getAssignmentsFromMaster(), and its code is as follows:

    public void getAssignmentsFromMaster(Map conf, IStormClusterState clusterState, String node) {
            ... ... 
                SupervisorAssignments assignments = master.getClient().getSupervisorAssignments(node);
                LOG.debug("Sync an assignments from master, will start to sync with assignments: {}", assignments);
                assignedAssignmentsToLocal(clusterState, assignments);
            ... ... 

        }
    }

It is obvious here that the Supervisor will remotely call the Nimbus::getSupervisorAssignments() function to obtain the corresponding task assignments through a function, and then call the function assignedAssignmentsToLocal() to execute the task creation. Take a look at the implementation of the assignedAssignmentsToLocal() function. It eventually calls into StormClusterStateImpl::syncRemoteAssignments()

public void syncRemoteAssignments(Map<String, byte[]> remote) {
        if (null != remote) {
            this.assignmentsBackend.syncRemoteAssignments(remote);
        } else {
            Map<String, byte[]> tmp = new HashMap<>();
            List<String> stormIds = this.stateStorage.get_children(ClusterUtils.ASSIGNMENTS_SUBTREE, false);
            for (String stormId : stormIds) {
                byte[] assignment = this.stateStorage.get_data(ClusterUtils.assignmentPath(stormId), false);
                tmp.put(stormId, assignment);
            }
            this.assignmentsBackend.syncRemoteAssignments(tmp);
        }
    }

Summarize

The whole submission process can be divided into three parts:
1) Client, the client will process spout and bolt in a unified manner, and will be in the same map
2) After the client submits the task to Nimbus, Nimbus does not immediately create the task ; Nimbus will start a timing thread, which is responsible for selecting the Supervisor node for task execution
3) Supervisor will also start a thread to obtain task information from Nimbus through RPC remote calls.

Storm2.0 source code analysis -- task submission