Introduction to storm grouping strategy

1. Storm data source
Spout data source:

MQ: Direct stream data source

Db: Only read configuration files

File
Problems: 1. Distributed applications cannot be read; 2. Spout development and concurrency will repeatedly read incremental data
of Log files: 1. Read content and write to MQ, 2. Storm processing If assigned to multiple Executors on Bolts (multi-threading, concurrency) Note: Not a spout or bolt emits to multiple bolts (broadcasting). There are 6 types of stream grouping in storm. Single thread is equivalent to All Grouping 1.Shuffle Grouping: polling, each thread is evenly distributed. Randomly distribute tuples in the stream to ensure that each bolt receives the same number of tuples. 2. Non Grouping: No grouping. This grouping has the same effect as Shuffle grouping, and is not evenly distributed under multi-threading. 3. Fields Grouping: Grouping by Field, such as grouping by word, tuples with the same word will be assigned to the same Bolts, and different words will be assigned to different Bolts. Function: 1. Filter, select some Fields from the multi-output Fields of the source (Spout or previous Bolt)            2. The same tuple will be distributed to the same thread (Executer or task) for processing

















Typical scenarios: Deduplication operation, Join

4. All Grouping: Broadcast sending, for each tuple, all Bolts will be received. The data obtained by each thread is consistent.

5. Global Grouping: Global grouping, this tuple is assigned to one of the tasks of a bolt in storm. To be more specific, it is assigned to the task with the lowest id value. Suitable scene: Unimaginable.

6. Direct Grouping: Direct grouping, which is a special grouping method. Using this grouping means that the sender of the message decides which task of the receiver of the message will process the message. Only message streams declared as Direct Streams can declare this grouping method. And this message tuple must be emitted using the emitDirect method. The message handler can pass the TopologyContext to or process the taskid of its message (the OutputCollector.emit method will also return the taskid)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326360552&siteId=291194637