流计算框架storm概览

Attention: 

supervison 和 nimbus的状态都实时保存在zookeeper集群中和本地.  Enchance, this means you can kill -9 Nimbus or the Supervisors and they'll start back up as nothing happened. 

Topologies

1. storm jar all-my-code.jar org.apache.storm.MyTopology arg1 arg2   

The main function of the class defines the topology and submits it to Nimbus. 

The storm jar part takes care of connecting to Nimbus and uploading the jar.

2. Since topology definitions are just Thrift structs, and Nimbus is a Thrift service, you can create and submit topologies using any programming language.    (相当于任何语言都可以进行http协议,提交服务器一样.)

(Thrift是一种接口描述语言和二进制通讯协议,它被用来定义和创建跨语言的服务。它被当作一个远程过程调用(RPC)框架来使用,是由Facebook为“大规模跨语言服务开发”而开发的。)

Streams

The core abstraction in Storm is the "stream".  A stream is an unbounded sequence of tuples.

Networks of spouts and bolts are packaged into a "topology" which is the top-level abstraction that you submit to Storm clusters for execution.

When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.

 

Data model

   Storm uses tuples(元组,数组.) as its data model. A tuple is a named list of values, and a field in a tuple can be an object of any type.

   Every node in a topology must declare the output fields for the tuples it emits.

   

A simple topology

TopologyBuilder builder = new TopologyBuilder();        
builder.setSpout("words", new TestWordSpout(), 10);        
builder.setBolt("exclaim1", new ExclamationBolt(), 3)
        .shuffleGrouping("words");
builder.setBolt("exclaim2", new ExclamationBolt(), 2)
        .shuffleGrouping("exclaim1");
 
 

Running ExclamationTopology in local mode

To run a topology in local mode run the command storm local instead of storm jar.

Stream groupings

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("sentences", new RandomSentenceSpout(), 5);        
builder.setBolt("split", new SplitSentence(), 8)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
        .fieldsGrouping("split", new Fields("word"));

 
 

分组类型的区别: shuffleGrouping, fieldsGrouping
shuffleGrouping 随机发送tuple给bolt.
fieldsGrouping 按照固定字段去发送tuple给bolt.

Fields groupings are the basis of implementing streaming joins and streaming aggregations as well as a plethora of other use cases. Underneath the hood, fields groupings are implemented using mod hashing.

字段分组是实现流连接和流聚合以及大量其他用例的基础。在幕后,字段分组是使用mod哈希实现的。

Defining Bolts in other languages

 Storm ships with adapter libraries for Ruby, Python, and Fancy.

public static class SplitSentence extends ShellBolt implements IRichBolt {
    public SplitSentence() {
        super("python", "splitsentence.py");
    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }
}
 
 

import storm

class SplitSentenceBolt(storm.BasicBolt):
    def process(self, tup):
        words = tup.values[0].split(" ")
        for word in words:
          storm.emit([word])

SplitSentenceBolt().run()

Trident (achieve exactly-once messaging semantics for most computations) 

Storm guarantees that every message will be played through the topology at least once. A common question asked is "how do you do things like counting on top of Storm? Won't you overcount?" Storm has a higher level API called Trudent that let you achieve exactly-once messaging semantics for most computations.  

大多数计算实现一次消息传递语义, 对统计具有重要意义.

Trident developed from an earlier effort to provide exactly-once guarantees for Storm

猜你喜欢

转载自blog.csdn.net/gridlayout/article/details/129286794