Storm combat (2) The role of ZMQ and netty in storm

The role of ZMQ and netty in storm

    In the storm cluster installation, I chose the version storm1.0.0. The official website said that the dependent libraries required by storm1.0.0 are jdk1.6+ and Python2.6.6, but I accidentally found a blog to choose when browsing the blog. The version of storm 0.8.1 said that ZMQ was needed. At that time, it was confusing. Do you need it? ? ? ? ? ? ? ? ?

     References:

1. Storm's messaging

    For Storm, his message distribution mechanism is explicitly defined when the Topology is defined. That is to say, the developer of the application needs to clearly define the relationship between the various Bolts, and how the downstream Bolt obtains the Tuple issued by the upstream Bolt. Storm has six message distribution modes:

  • Shuffle Grouping : Random grouping, Storm will try to distribute the data evenly to the downstream Bolts.
  • Fields Grouping : Grouping by fields, such as grouping by userid, tuples with the same userid will be assigned to the same Bolt. This is very helpful for applications like WordCount.
  • All Grouping : Broadcast, for each Tuple, all Bolts will receive. This distribution mode should be used with caution, as it will cause a great waste of resources.
  • Global Grouping : Global grouping, this Tuple is assigned to one of the tasks of a bolt in Storm. This is useful for implementing transactional Topologies.
  • Non Grouping : No grouping, this grouping means that the stream does not care who will receive its tuple. At present, this grouping has the same effect as Shuffle grouping. The difference is that Storm will put the bolt into the same thread as the bolt's subscriber to execute.
  • Direct Grouping : Direct grouping, which is a special grouping method. Using this grouping means that the sender of the message specifies which task of the receiver of the message handles the message.

    Messaging points:

    Message queues are now a very general solution for communication between modules. Message queues allow inter-process communication to span physical machines, which is particularly important for distributed systems, after all, we cannot assume whether processes are deployed on the same physical machine or on different physical machines. RabbitMQ is a widely used MQ. For RabbitMQ, you can see one of my columns: RabbitMQ

 

    When it comes to MQ, I have to mention ZeroMQ. ZeroMQ encapsulates Socket, citing the official statement: "ZMQ (ZeroMQ hereinafter referred to as ZMQ) is a simple and easy-to-use transport layer, a socket library like a framework, which makes Socket programming simpler, more concise and more performant. It is a Message processing queuing library that scales elastically across multiple threads, cores and mainframes. ZMQ's explicit goal is to "be part of the standard networking stack and then into the Linux kernel". Haven't seen their success yet. But , which is certainly a promising and much needed wrapper over "traditional" BSD sockets. ZMQ makes writing high-performance network applications extremely easy and fun."

 

    So ZeroMQ is not MQ in the traditional sense. It is more suitable for communication between nodes and between nodes and Master. The communication between workers before Storm 0.8 is through ZeroMQ. But why does 0.9 replace ZeroMQ with Netty? It is not appropriate to say that the replacement is not suitable, but the default communication between workers in 0.9 uses Netty, and ZeroMQ still supports it. Storm officially believes that ZeroMQ has the following shortcomings:

  • Not easy to deploy . Especially in the cloud environment: think that ZMQ is written in C, so it is still closely dependent on the operating system environment.
  • Its memory cannot be limited . The memory occupied by java can be easily limited by the JVM. But ZMQ seems like a black box to Storm.
  • Storm cannot get information from ZMQ . For example, Storm has no way of knowing how much data is currently being sent in the buffer.

    Of course, there are so-called performance issues, which can be accessed from the Netty author's blog. The conclusion is that Netty performs twice as well as ZMQ (in its default configuration). Don't know what is the so-called default configuration of ZMQ. Anyway, I was surprised by the result. Of course, Netty's implementation in Java does facilitate communication between workers plus authorization and authentication mechanisms. This use of ZMQ is indeed not very easy to do.

 

    2. ZMQ works in storm0.8.x and previous versions

  • ZMQ is used for communication between nodes in storm0.8.x and previous versions and between nodes and Master
  • The default communication between workers of storm 0.9 uses Netty , and ZeroMQ still supports it.

 

        To sum up: as long as the version you choose is storm 0.9 and later, you don't need to install ZMQ, because the version after storm 0.9 uses netty by default. If you want to force ZMQ, you have to reinstall it.

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326964124&siteId=291194637