Storm's fault tolerance

Storm has several different daemons. Nimbus schedules workers, Supervisor runs and kills workers, log viewer provides access to logs, and UI displays cluster status.

 

1. Q: What happens if a worker dies?

A: If a worker dies, the supervisor will restart the worker. If the worker keeps failing at startup, it will not be able to heartbeat with Nimbus normally, and Nimbus will reschedule the worker.

 

2. Q: What happens if a Node dies?

A: Tasks assigned to that machine will time out and Nimbus will reassign them to other machines.

 

3. Q: What happens if Nimbus or Supervisor fails?

Answer: Nimbus and Supervisor are designed to be fail-fast fail-fast (execute self-destruct whenever any abnormal condition occurs), and stateless (all state is stored on zookeeper or disk). As described in Setting up a Storm cluster , Nimbus and Supervisor must be running under a monitor like daemontools or monit, so that after Nimbus or Supervisor dies, they can be restarted as if nothing happened.

 

More notably, the death of Nimbus or Supervisor will not affect any worker processing. This is very different from Hadoop, if the JobTracker dies, then all running job information will be lost.

 

4. Q: Does Nimbus have a single point of failure?

A: If you lose the Nimbus node, the workers will continue to work. Additionally, the supervisor will continue to restart workers when they die. However, without Nimbus, workers won't be reassigned to the machine the day before yesterday when needed (like if you lost a worker).

 

Storm Nimbus is highly available since 1.0.0. For more information see: Nimbus HA Design documentation.

 

5. Q: How can Storm guarantee data processing?

A: Storm provides mechanisms to guarantee data processing even if a node dies or messages are lost. See: Guaranteeing message processing documentation.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326155258&siteId=291194637