Section 1 storm Programming: 2, storm basic introduction

Course Outline:

1, storm basic introduction

2, storm architectural model

3, storm installation

4, storm the UI management interface

5, storm programming model

6, storm entry procedures

7, storm parallelism

8, the message of the storm's distribution policy

9, strom get integrated with kafka

10, real-time Kanban integrated case

 

 

 

1, storm basic introduction

storm's official website: http://storm.apache.org/

 

twitter The company provides open source, one of the earliest version is 0.8.0, processing speed is faster

The longer the larger the island, ignorant of coastline cognition

 

Storm is a distributed real-time computing systems open source, simple and reliable process large amounts of data streams. Storm has many usage scenarios: If the real-time analysis, online machine learning, continuous computing, distributed RPC, ETL and so on. Storm level of support extension having a high fault tolerance, to ensure that each message is processed, and the processing speed is fast (in a small cluster, each node can handle millions of messages per second). Storm deployment and operation and maintenance are very convenient, but more importantly, you can use any programming language to develop applications.
Storm has the following characteristics:

  • Simple programming model

I believe we have been familiar to hadoop, it provides developers with a map-based Hadoop Google Map / Reduce implemented, reduce the original language, the parallel batch program becomes very simple and beautiful large data processing. Similarly, Storm also provides real-time calculation of large data some simple beautiful primitives, which greatly reduces the complexity of developing a parallel real-time processing tasks to help you fast and efficient application development.

  • Scalable

Storm really run in a cluster topology of three main entities: work processes, threads and tasks. Each machine can run on multiple worker processes Storm cluster, each worker process but also create multiple threads, each thread can perform multiple tasks, the task is true entity data processing, we have developed the spout, bolt is as a way of one or more tasks to perform.
Therefore, parallel computing tasks among multiple threads, processes and server support flexible horizontal expansion.

  • High reliability

Storm can ensure that each message sent spout can be "completely processed", which is different from the direct place other real-time systems, such as S4.
Note that the message sent follow-up spout may trigger generate thousands of messages, images can be understood as a message tree, where the message was sent spout roots, Storm tracks tree handling of message tree, only when all the messages in the message tree tree have been processed, Storm would think the news spout issue has been "completely processed." If any of the messages in the message tree tree processing fails, or whole grain message tree for a limited time is not "fully process", the message will be sent spout retransmission.
Taking into account to minimize memory consumption, Storm does not track the message tree of each message, instead of using a special strategy that the message tree as a whole to keep track of all the messages in the message tree unique id then XOR, through the spout is zero message sent is determined whether the "fully process", which is a great saving memory and simplifying the decision logic behind this mechanism will be described in detail.
This model, each sending a message, will send a sync ack / fail, there will be some network bandwidth for consumption, if the reliability is less demanding, can emit different interfaces by using the off mode.
Mentioned above, Storm guarantees that each message is processed at least once, but for some computing situations, will be strictly required that each message is processed only once, fortunately the Storm 0.7.0 introduces a transactional topology solve this problem, there will be described in detail later.

  • High fault tolerance

If out some anomalies in the news process, Storm will re-arrange the problem of processing units. Storm guarantees that a processing unit runs forever (unless you explicitly kill the processing unit).
Of course, if the processing unit is stored in the intermediate state, then when the processing unit is re-started the Storm, you need to apply their own recovery process intermediate state.

  • Support for multiple programming languages

In addition to implementing spout and bolt with java, you can also use any of you are familiar with the programming language to do the job, all thanks to the so-called Storm multi-language protocol. Multilingual Storm protocol is inside a special protocol that allows spout or bolt using standard input and standard output for message delivery, message passing single line of text into multiple lines or json encoded.
Storm supports multi-language programming primarily through ShellBolt, ShellSpout and ShellProcess these classes to implement these classes implement the IBolt and ISpout interface, and let ProcessBuilder class shell to execute the agreement by java script or program.
It can be seen that, in this way, when each tuple in the treatment requires a codec json therefore have a greater influence on the throughput.

  • Support local mode

Storm has a "local mode", which is an analog of a Storm cluster all the functions in the process, with a similar topology running on a cluster topology running in local mode, which is useful for us to develop and test it.

  • Efficient

 

Compared with mapreduce:

storm faster, mapreduce slower

strom is streaming, mapreduce is a bunch of data processing

 

Ultimately it features: streaming, the processing speed

Guess you like

Origin www.cnblogs.com/mediocreWorld/p/11223342.html