Storm中的可靠性
Storm的ISpout接口定义了三个与可靠性有关的方法:nextTuple,ack和fail。
public interface ISpout extends Serializable { void open( Map conf, TopologyContext context, SpoutOutputCollector collector); void close(); void nextTuple(); void ack(Object msgId); void fail(Object msgId); }
我们知道,当Storm的Spout发射一个Tuple后,他便会调用nextTuple()方法,在这个过程中,保证可靠性处理的第一步就是为发射出的Tuple分配一个唯一的ID,并把这个ID传给emit()方法:
collector.emit( new Values("value1" , "value2") , msgId );
为Tuple分配一个唯一ID的目的就是为了告诉Storm,Spout希望这个Tuple产生的Tuple tree在处理完成或失败后告知它,如果Tuple被处理成功,Spout的ack()方法就会被调用,相反如果处理失败,Spout的fail()方法就会被调用,Tuple的ID也都会传入这两个方法中。
需要注意的是,虽然spout有可靠性机制,但这个机制是否启用由我们控制的。IBasicBolt在emit一个tuple后自动调用ack()方法,用来实现比较简单的计算。如果是IRichBolt的话,如果想要实现anchor,必须自己调用ack方法。
storm的可靠性是由spout和bolt共同决定的,storm利用了anchor机制来保证处理的可靠性。如果spout发射的一个tuple被完全处理,那么spout的ack方法即会被调用,如果失败,则其fail方法便会被调用。在bolt中,通过在emit(oldTuple,newTuple)的方式来anchor一个tuple,如果处理成功,则需要调用bolt的ack方法,如果失败,则调用其fail方法。一个tuple及其子tuple共同构成了一个tupletree,当这个tree中所有tuple在指定时间内都完成时spout的ack才会被调用,但是当tree中任何一个tuple失败时,spout的fail方法则会被调用。
IBasicBolt类会自动调用ack/fail方法,而IRichBolt则需要我们手动调用ack/fail方法。我们可以通过TOPOLOGY_MESSAGE_TIMEOUT_SECS参数来指定一个tuple的处理完成时间,若这个时间未被处理完成,则spout也会调用fail方法。
一个实现可靠性的spout:
public class ReliableSentenceSpout extends BaseRichSpout { private static final long serialVersionUID = 1L; private ConcurrentHashMap<UUID, Values> pending; private SpoutOutputCollector collector; private String[] sentences = { "my dog has fleas", "i like cold beverages" , "the dog ate my homework" , "don't have a cow man" , "i don't think i like fleas" }; private int index = 0; public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare( new Fields( "sentence")); } public void open( Map config, TopologyContext context, SpoutOutputCollector collector) { this. collector = collector; this. pending = new ConcurrentHashMap<UUID, Values>(); } public void nextTuple() { Values values = new Values( sentences[ index]); UUID msgId = UUID. randomUUID(); this. pending.put(msgId, values); this. collector.emit(values, msgId); index++; if ( index >= sentences. length) { index = 0; } //Utils.waitForMillis(1); } public void ack(Object msgId) { this. pending.remove(msgId); } public void fail(Object msgId) { this. collector.emit( this. pending.get(msgId), msgId); } }
例子2:
public class RandomSpout extends BaseRichSpout { private SpoutOutputCollector collector; private Random rand; private AtomicInteger counter; private static String[] sentences = new String[]{"edi:I'm happy", "marry:I'm angry", "john:I'm sad", "ted:I'm excited", "laden:I'm dangerous"}; @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { this.collector = collector; this.rand = new Random(); counter = new AtomicInteger(); } @Override public void nextTuple() { Utils.sleep(5000); String toSay = sentences[rand.nextInt(sentences.length)]; int msgId = this.counter.getAndIncrement(); toSay = "[" + msgId + "]" + toSay; PrintHelper.print("Send " + toSay); this.collector.emit(new Values(toSay), msgId); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("sentence")); } @Override public void ack(Object msgId) { PrintHelper.print("ack " + msgId); } @Override public void fail(Object msgId) { PrintHelper.print("fail " + msgId); } }
一个实现可靠性的bolt:
public class ReliableSplitSentenceBolt extends BaseRichBolt { private OutputCollector collector; public void prepare( Map config, TopologyContext context, OutputCollector collector) { this. collector = collector; } public void execute(Tuple tuple) { String sentence = tuple.getStringByField("sentence" ); String[] words = sentence.split( " "); for (String word : words) { this. collector.emit(tuple, new Values(word)); } this. collector.ack(tuple); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare( new Fields( "word")); } }