Storm understanding

1, Storm streaming:

                            Storm vs. mapreduce

                            Storm: for real-time

                                               Disadvantages: poor handling capacity

                                               Advantages: good aging, milliseconds, incremental process

                            Mapreduce: batch-oriented

                                               Disadvantages: poor timeliness

                                               Advantages: strong throughput for batch processing

 

2, Storm: no persistent feature - "fast

                            Reliability: guaranteed message processing

                            Local Mode

                            Primitives: spout and bolt

                           

3, Storm basic concepts:

       

 

 

                            1) Stream: stream

                            2) Tuple: the basic data unit

                            3) Topology: Network Topology

                                                        Grouping:Shuffle/Fields

                            4) Spout: message producers

                                                        Many types of data can be docked stream

                                                        收集消息处理的ack、fail

                            5)Bolt:消息处理逻辑

                                                        过滤、访问外部服务、数据格式化、聚合、汇总。。。

                                                        可以发送多条流

        

 

4、常见模式:

                   (1)流式

                   (2)持续计算——机器学习迭代

                   (3)分布式RPC——独立服务

         

 

5、架构:

        

 

                            主:Nimbus:分配工作

                                                        如果挂掉:重启之后,像什么事情没有发生一样——无状态(快速失败fail-fast)

                                                                意味着你可以用kill -9来杀死Nimbus和Supervisor进程, 然后再重启它们,就好像什么都没有发生过。这个设计使得Storm异常的稳定。

 

                            从:Supervisor:监控工作

                                                                 快速失败fail-fast,监控Worker工作

                                                                 Worker:工作进程

                                                                 Task:线程

                                                                                      spout和bolt的线程都是task

                                                                                      executor进程,里面维护很多task,每次只会执行一个task

                            Zookeeper协调管理

       

 

    

 

    

    

            

 

    

 

 

6、容错:

                   架构容错

                      

 

                   数据容错:

                                               (1)timeout

                                               (2)ack机制:本质是一个特殊的task

Guess you like

Origin www.cnblogs.com/chen8023miss/p/11205211.html