Introduction to Storm

Storm

----------------------------------------

real-time computing system.

Usage scenarios: real-time analysis, online machine learning, continuous computing.

The speed is fast, processing millions of tuples of data per node per second.

Stateless, cluster state and distributed environment information is kept in zk.

Make sure each message is consumed at least once.

 

 

Storm core concepts

--------------------------------------

1. Tuple, a data structure, an ordered list of elements. Usually any type of data, separated by commas, is sent to Storm for calculation.

2. Stream is an unordered collection of tuples.

3. Spouts: Faucets, Data Sources

4. Bolts: Adapter. Logical processing unit, spout transmits data to bolt, and bolt generates new data after processing, which can be filtered, aggregated, and grouped. Process: Receive data --> process --> output to bolt (multiple).

5. Toplogy: Topology, the connection of Spouts and Bolts forms Topology. Simply speaking, Topology is a directed graph, where the points are computations and the edges are data flows.

6. Task : spout或bolt的执行。 a task is either the execution of a spout or a bolt. At a given time, each spout and bolt can have multiple instances running in multiple separate threads.

7. Workers: Worker nodes, the executors of Tasks. Storm evenly distributes tasks among workers, and workers are responsible for monitoring and executing jobs.

8. Stream grouping: Controls how tuples are routed in Topology. There are four routing strategies built into Storm:

Shuffle grouping: tuples are randomly routed among all workers.

Field grouping: Group tuples by Field, and the same Fields are assigned to the same worker.

Global grouping: Allocate all tuples to the same worker.

All grouping: Distribute a puple to all workers.

 

 

 Storm components

------------------------------------------------

a. Nimbus : Reiki master node

Master node is responsible for distributing data among all the worker nodes, assign tasks to worker nodes and monitoring failures.

 

b. Supervisor: Supervisor

It has multiple worker processes and is responsible for supervising and managing the worker processes to complete the tasks assigned by Nimbus. A supervisor has multiple worker processes and it governs worker processes to complete the tasks assigned by the nimbus.

 

c. Worker process  

Responsible for executing a specific Topology, it does not run tasks by itself, but creates multiple executors for them to execute. A worker process will execute tasks related to a specific topology. A worker process will not run a task by itself, instead it creates executors and asks them to perform a particular task. A worker process will have multiple executors.

 

d. Executor  

A thread of the Worker process, an Exector can only execute one or more Tasks for a specific Spout or Bolt. An executor is nothing but a single thread spawn by a worker process. An executor runs one or more tasks but only for a specific spout or bolt.

 

e. Task

A task performs actual data processing. So, it is either a spout or a bolt.

 

 

Storm workflow

1. Nimbus is waiting for the submission of the new Topology.

2. After Topology is submitted, Nimbus collects all available tasks and does not organize the order of tasks.

3. Nimbus distributes all tasks equally among all available Supervisors.

4. Every so often, Supervisors will send heartbeat packets to Nimbus to indicate that it is alive.

5. When a Supervisor stops sending heartbeat packets, Nimbus will send tasks to other Supervisors.

6. When Bimbus dies, Supervisors will continue to execute their assigned tasks without error.

7. When all tasks are executed, Supervisor will wait for new tasks.

8. At the same time, the dead Nimbus will be opened by the monitoring tool.

9. A restarted Nimbus will pick up where it left off. Storm guarantees that all tasks are executed at least once.

10. Once all Topologies are executed, Nimbus will receive the new Topology and Supervisor will receive the new task.

 

 

 Storm mode of operation

------------------------------------

1. Local Mode executes in the same JVM

2. Production Mode is executed in the cluster

 

 

refer to:

https://www.tutorialspoint.com/apache_storm/index.htm

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326432773&siteId=291194637