storm record

 

Information on the Internet has been very good, and is a lot, I just wrote this problem when I applied myself to do storm encountered
and the following knowledge does not involve very comprehensive, focusing on personal experience, knowledge summary

Random search of the article, I feel pretty good
https://www.cnblogs.com/peak-c/p/6297794.html
https://blog.csdn.net/weiyongle1996/article/details/77142245
(link above in more detail, Here are just a they think of knowledge focus)

I think learning storm framework should focus on the following aspects
(of course, this is personal, or over time, cognitive changes I will look at the issue from another angle)
architecture system
knowledge concept
for use
tuning Description
Code in-depth
project application

Architecture system
https://blog.csdn.net/weiyongle1996/article/details/77142245
can refer to the above link, the original out of that official, I have nothing to add.
Figure architecture is divided into three parts
nimbus master node: topology responsible for submission of the storm, supervisor node management, etc.
supervisor from node: responsible for executing specific code, it is important that the concept of the above have Task worker
zookeeper distributed coordination: the zookeeper Features , destined to be used as metadata management, and is distributed on the basis of the birth of its many excellent projects
I have encountered in the actual situation dubbo, hbase, hadoop, kafka (categories can also be in spring cloud application to the eureka), which saved the storm node to pay attention to look at.
System architecture is to allow people from the big aspects of a sudden understanding of the system structure is to give an outsider and insider look, with further research, in order to gradually understand the meaning of architecture diagram represents.

Knowledge concept:
Topology: the Acyclic Directed Graph conceptual directed acyclic graph corresponding to the geometry of the pattern to the dotted line connection composed of the concept of flow, stream flows, abstract point directed acyclic graph composition .
Wherein the unit is a combination of structured and Spout of Bolt.
task: Real performing logical concept we write the code, in the worker, as the smallest unit of scheduling performed in (the Executor) threads, the default is the correspondence relationship of 1: 1, it can be changed by means of the code.
worker: a process corresponding to a java, the node belonging to the cell in the Supervisor.
executor: corresponding to a java thread, the worker belonging to a scheduling unit, tuning key
spout: the source of the data source, abstract concepts, most of the structure of our system is made, the acquisition / input data processing system, and storage, back to the appropriate place.
bolt: data processing stage, streaming important nodes, a plurality of flow can be treated.
stream: computing the core flow.
stream group: streaming large data calculation, and application of the concept of divide and rule, and after exposure to the MapReduce concept, you will know the role of the Map, take Alipay yearly Annual Statement analogy, pay the amount of data treasure calculations, I can not think specific number. All data is calculated for each person in the calculation of the annual bill each year, is the way through the Map, the summary of each user's data together (Map stage), then calculate (Reduce) out (do not know the specific implementation, but no matter how does it do, always escape the idea of divide and rule).
Streaming packet, is the beginning of divide and rule.


Instructions for use
everywhere online DEMO


Tuning Description:
There are the following
number of threads
the number of connections to the database
... number pool resource pooling various techniques
maxSpoutPending spout acquired data amount processed up to as a cycle
time-out retransmission mechanism processing time of storm TOPOLOGY_MESSAGE_TIMEOUT_SECS default 30s, according to the processing speed, the number of nodes to adjust resources
×× (focus) process speed (by performing logical time in the log to monitor blot, as small as possible, the use of timed manner)
the amount of data
database indexes (or index NoSQL)

Observation storm ui (spout monitoring process pressure, process duration, number of failures .bolt long processing, the number of treatment failures, the processing pressure)


Code-depth
understanding of the message confirmation mechanism Ack storm


Project application
projects I used the system for the analysis of user behavior, operation history by taking the user's user electricity supplier, which is sent to kafka, storm by site statistics indicators (pv uv number of online users, etc.), and then presented to the end-user storage.
The biggest problem encountered in the project is to repeat the processing of the message, do not understand the mechanism before the storm, has been considered a spout took repeated data (kafka in), to understand the mechanism of discovery is confirmed after the failure mechanism sotrm, executed repeat the message mechanism. It found that too many times fail by storm ui, by printing bolt performance log and found too long when handling, plus an index, plus the number of processing threads, message processing amount plus the spout, spout out mechanism, the performance greatly enhanced.

Concurrency: the number of single worker thread has been a big problem, we the ordinary way are set to cpu audit to determine if a separate application must be designed to not fully utilize the performance of the machine, our tomcat general concurrency but also to achieve 100+ therefore be adjusted by other software systems is the concept of thousands, use an idea, nginx 5w, netty this time to doctrine storm in specific scenarios.

 

Guess you like

Origin blog.csdn.net/soliy/article/details/91357211
Recommended