storm java 开发的hello world 入门示例
Storm中的核心概念:
Stream: 流是一个没有边界的Tuple序列,这些Tuple以分布式的方式并行创建与处理
Spout:是Topology中消息的生产者(即是Tuple的创建者)
Bolt:消息的处理者
是接收Spout发出Tuple后处理数据的组件,并产生新的数据输出流。可执行过滤,合并,统计等操作。
生命周期:客户端创建Bolt,然后将其序列化拓扑,并提交给集群中的主机。集群启动worker进程,反序列化Bolt,调用prepare方法开始处理Tuple
Tuple:可简单的认为是消息传递时约定的一种数据结构,Fields-Values形式。可形象的比喻为数据库表定义与数据储存,定义好了Fields之后它的值可能就是一个Values的列表.Fields默认支持的类型有:Integer,float,double,long,short,string,byte 其它类型的就需要序列化了
大概看一下Fields对应的源码
Tuple的生命周期:backtype.storm.spout.ISpout 实现接口负责产生与传递Tuple
Task:每一个Spout/Bolt的运行线程称为一个Task,也就是Spout/Bolt的执行单元
Worker:是一个java进程,执行Topology的一部分任务。会启动一个或是多个executor线程来执行一个Topology的组件(Spout/Bolt),因此在执行时,可能跨一个或是多个worker
Stream Grouping:定义一个流如何分配Tuple到Bolt的,主要流分组类型有:
- 随机分组
- 字段分组
- 全部分组:对于每一个Tuple来说,所有的Bolt都会收到
- 全局分组:全部的流都分配到Bolt的同一个任务
- 无分组
- 直接分组 :元组的生产者决定元组由哪个元组消费者任务接收
Topology:
在分布式模式时,需要打包成JAR包放到Nimbus中运行。在打包时不需要把依赖的Jar都打进去。
在nimbus服务器上提交: ./bin/storm jar stormapp.jar java-main-class args
例子:单词数的统计 :
流程:输入文字内容-> 单词分割-> 单词个数统计
1.消息源Spout
import java.util.Map; import java.util.Random; import backtype.storm.spout.SpoutOutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichSpout; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Values; /** * 接收数据来源 * @author Administrator * */ public class WordSpout extends BaseRichSpout { private SpoutOutputCollector collector; private static final String[] msgs=new String[]{ "word spout", "hello storm", "java python", "first is java", "second is python", "do what you want to do" }; private static final Random random=new Random(); @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { System.out.println("word spout open method called. with conf="+conf); this.collector=collector; } @Override public void nextTuple() { String line=msgs[random.nextInt(msgs.length)]; System.out.println("****************************word spout nextTuple method called.==="+line); this.collector.emit(new Values(line));//随机的发送一句话 try{ Thread.sleep(2000); }catch(Exception ex){ ex.printStackTrace(); } System.out.println("execute once."); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { System.out.println("word spout declareOutputFields method called."); declarer.declare(new Fields("sentence")); } }
2.消息的处理者Bolt
单词分割:
import java.util.Map; import backtype.storm.task.TopologyContext; import backtype.storm.topology.BasicOutputCollector; import backtype.storm.topology.IBasicBolt; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; /** * 接收来自spot的数据,进行一步的处理 * 本示例是:将一句话,拆分成单个的词 * @author Administrator * */ public class WordSplitBolt implements IBasicBolt { @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // TODO Auto-generated method stub System.out.println("WordSplitBolt declareOutputFields called."); declarer.declare(new Fields("word")); } @Override public Map<String, Object> getComponentConfiguration() { // TODO Auto-generated method stub System.out.println("WordSplitBolt getComponentConfiguration called."); return null; } @Override public void prepare(Map stormConf, TopologyContext context) { // TODO Auto-generated method stub System.out.println("WordSplitBolt prepare called."); } @Override public void execute(Tuple input, BasicOutputCollector collector) { // TODO Auto-generated method stub System.out.println("WordSplitBolt execute called."); String sentence=input.getString(0); for(String word:sentence.split(" ")){ collector.emit(new Values(word)); } } @Override public void cleanup() { // TODO Auto-generated method stub System.out.println("WordSplitBolt cleanup called."); } }
单词统计:
import java.util.HashMap; import java.util.Map; import backtype.storm.task.TopologyContext; import backtype.storm.topology.BasicOutputCollector; import backtype.storm.topology.IBasicBolt; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; /** * 单词统计 * * @author Administrator * */ public class WordCountBolt implements IBasicBolt{ private static Map<String,Integer> count=new HashMap<String,Integer>(); @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // TODO Auto-generated method stub System.out.println("WordCountBolt declareOutputFields called."); declarer.declare(new Fields("word","count")); } @Override public Map<String, Object> getComponentConfiguration() { // TODO Auto-generated method stub System.out.println("WordCountBolt getComponentConfiguration called."); return null; } @Override public void prepare(Map stormConf, TopologyContext context) { // TODO Auto-generated method stub System.out.println("WordCountBolt prepare called."); } @Override public void execute(Tuple input, BasicOutputCollector collector) { // TODO Auto-generated method stub System.out.println("WordCountBolt execute called."); String word=input.getString(0); int wordCount=0; if(count.containsKey(word)){ wordCount=count.get(word).intValue(); } wordCount++; count.put(word, wordCount); collector.emit(new Values(word,wordCount)); } @Override public void cleanup() { // TODO Auto-generated method stub System.out.println("WordCountBolt cleanup called."); } }
3.拓扑Topology
3.1一个topology结构是由Spout,Bolt 组成的。Spout与Bolt之间是通过流分组连接起来的
3.2一个topoloy task启动之后是会一一直在运行,除非杀掉
import backtype.storm.Config; import backtype.storm.LocalCluster; import backtype.storm.topology.TopologyBuilder; public class WordTopology { public static void main(String[] args) { System.out.println("start word topology."); TopologyBuilder builder=new TopologyBuilder(); builder.setSpout("word-spout", new WordSpout()); builder.setBolt("word-split", new WordSplitBolt()).shuffleGrouping("word-spout"); builder.setBolt("word-count",new WordCountBolt()).shuffleGrouping("word-split"); Config conf=new Config(); conf.put("paramContext", "cust");//可上下文传递参数 conf.setDebug(true); LocalCluster cluster=new LocalCluster(); //本地模式测试 cluster.submitTopology("word-count", conf, builder.createTopology()); System.out.println("finished word topology."); } }
4.Topology运行流程与方法调用过程
5.