Big data learning (Storm) - detailed principle!

Role

Client

The main role of the client is to submit the topology to the cluster

Worker

A worker is an independent JVM process running on the Supervisor node. Its main function is to run a topology. A topology can contain multiple workers, but a worker can only belong to one topology.

Exceutor

For threads running in Worker, an Executor can correspond to one or more Tasks, and each Task (Spout or Bolt) must correspond to an Executor.

Task

An instance of an independent processing logic, each Spout or Bolt can correspond to multiple Tasks running in the cluster, and each thread corresponds to an Executor thread.
The streaminggroup defines how to send data from one set of tasks to another set of tasks.

Storm cluster startup, task submission and execution process

start up

When the customer runs storm nimbus or storm supervisor, there are actually two python functions inside the storm script. These two functions will eventually generate a java command to start a storm java process:

java -server xxxx.xxxx.nimbus/supervisor args
1
task submission

Run storm java xxxx.MainClass name, then the main function of the Driver driver class will be executed.
In the driver class, the topologyBuilder.createTopology() method will be called, which will generate the serialized objects of spout and bolt. The
client will put the jar corresponding to the topology Uploaded to the storm-local/nimbus/inbox directory of
nimbus First, nimbus will copy storm.jar to the /home/hadoop/storm-local/nimbus/stormdist/wordcount01-2-1525621662 directory, and generate it according to the second step The serialized object generates the serialized file of the task and the serialized file of the related configuration (wordcount01-2-1525621662 is a unique topology name generated by storm). At this time, nimbus can assign tasks
-rw-rw-r --. 1 hadoop hadoop 3615 May 6 23:47 stormcode.ser
-rw-rw-r--. 1 hadoop hadoop 733 May 6 23:47 stormconf.ser
-rw-rw-r--. 1 hadoop hadoop 3248667 May 6 23:47 stormjar.jar
1
2
3
Next, task assignment will be performed. After the assignment is completed, an assembly object will be generated, which will be serialized and saved to the /storm/assignments/wordcount01-2-1525621662 directory of zookeeper Down

The supervisor perceives changes in the /storm/assignments directory through zookeeper's watch mechanism, and pulls the data's own topology (when the nimbus assigns, the supervisor to which the task belongs will be specified)

supversior starts the worker on the specified port according to the pulled information, which is actually executing a java script

java -server xxxxx.xxxx.worker
1
After the worker starts, it starts to execute according to the assigned task information.

Big data learning exchange group 766988146 No matter you are a novice or a big cow, I am very welcome to the author. Today's source code has been uploaded to the group file, and I will share dry goods from time to time,
including the latest big data that I have compiled for learning in 2018. Data development and zero-based introductory tutorials, welcome to beginners and advanced partners

Big data learning (Storm) - detailed principle!

Guess you like