storm开发篇1 - 示例入门

storm java 开发的hello world 入门示例

Storm中的核心概念:

Stream: 流是一个没有边界的Tuple序列,这些Tuple以分布式的方式并行创建与处理

Spout:是Topology中消息的生产者(即是Tuple的创建者)

            

Bolt:消息的处理者

        是接收Spout发出Tuple后处理数据的组件,并产生新的数据输出流。可执行过滤,合并,统计等操作。

        生命周期:客户端创建Bolt,然后将其序列化拓扑,并提交给集群中的主机。集群启动worker进程,反序列化Bolt,调用prepare方法开始处理Tuple


Tuple:可简单的认为是消息传递时约定的一种数据结构,Fields-Values形式。可形象的比喻为数据库表定义与数据储存,定义好了Fields之后它的值可能就是一个Values的列表.Fields默认支持的类型有:Integer,float,double,long,short,string,byte 其它类型的就需要序列化了

    

大概看一下Fields对应的源码


        Tuple的生命周期:backtype.storm.spout.ISpout 实现接口负责产生与传递Tuple

Task:每一个Spout/Bolt的运行线程称为一个Task,也就是Spout/Bolt的执行单元

Worker:是一个java进程,执行Topology的一部分任务。会启动一个或是多个executor线程来执行一个Topology的组件(Spout/Bolt),因此在执行时,可能跨一个或是多个worker

Stream Grouping:定义一个流如何分配Tuple到Bolt的,主要流分组类型有:

  • 随机分组
  • 字段分组
  • 全部分组:对于每一个Tuple来说,所有的Bolt都会收到
  • 全局分组:全部的流都分配到Bolt的同一个任务
  • 无分组
  • 直接分组 :元组的生产者决定元组由哪个元组消费者任务接收        

Topology:

        在分布式模式时,需要打包成JAR包放到Nimbus中运行。在打包时不需要把依赖的Jar都打进去。

        在nimbus服务器上提交: ./bin/storm jar stormapp.jar java-main-class args


例子:单词数的统计 : 

        流程:输入文字内容-> 单词分割-> 单词个数统计

1.消息源Spout

       

import java.util.Map;
import java.util.Random;

import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;

/**
 * 接收数据来源
 * @author Administrator
 *
 */
public class WordSpout extends BaseRichSpout {
	
	private SpoutOutputCollector collector;
	
	private static final String[] msgs=new String[]{
			"word spout",
			"hello storm",
			"java python",
			"first is java",
			"second is python",
			"do what you want to do"
	};
	
	private static final Random random=new Random(); 
	
	@Override
	public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
		System.out.println("word spout open method called. with conf="+conf);
		this.collector=collector;
	}

	@Override
	public void nextTuple() {
		String line=msgs[random.nextInt(msgs.length)];
		System.out.println("****************************word spout nextTuple method called.==="+line);
		this.collector.emit(new Values(line));//随机的发送一句话
		try{
			Thread.sleep(2000);
		}catch(Exception ex){
			ex.printStackTrace();
		}
		System.out.println("execute once.");
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		System.out.println("word spout declareOutputFields method called.");
		declarer.declare(new Fields("sentence"));
	}
	
}

2.消息的处理者Bolt

    单词分割:

        

import java.util.Map;

import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.IBasicBolt;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;

/**
 * 接收来自spot的数据,进行一步的处理
 * 		本示例是:将一句话,拆分成单个的词
 * @author Administrator
 *
 */
public class WordSplitBolt implements IBasicBolt {

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		// TODO Auto-generated method stub
		System.out.println("WordSplitBolt declareOutputFields called.");
		declarer.declare(new Fields("word"));
	}

	@Override
	public Map<String, Object> getComponentConfiguration() {
		// TODO Auto-generated method stub
		System.out.println("WordSplitBolt getComponentConfiguration called.");
		return null;
	}

	@Override
	public void prepare(Map stormConf, TopologyContext context) {
		// TODO Auto-generated method stub
		System.out.println("WordSplitBolt prepare called.");
	}

	@Override
	public void execute(Tuple input, BasicOutputCollector collector) {
		// TODO Auto-generated method stub
		System.out.println("WordSplitBolt execute called.");
		String sentence=input.getString(0);
		for(String word:sentence.split(" ")){
			collector.emit(new Values(word));
		}
		
	}

	@Override
	public void cleanup() {
		// TODO Auto-generated method stub
		System.out.println("WordSplitBolt cleanup called.");
	}

}

    单词统计:

import java.util.HashMap;
import java.util.Map;

import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.IBasicBolt;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;

/**
 * 单词统计
 * 
 * @author Administrator
 *
 */
public class WordCountBolt implements IBasicBolt{
	
	private static Map<String,Integer> count=new HashMap<String,Integer>();

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		// TODO Auto-generated method stub
		System.out.println("WordCountBolt declareOutputFields called.");
		declarer.declare(new Fields("word","count"));
	}

	@Override
	public Map<String, Object> getComponentConfiguration() {
		// TODO Auto-generated method stub
		System.out.println("WordCountBolt getComponentConfiguration called.");
		return null;
	}

	@Override
	public void prepare(Map stormConf, TopologyContext context) {
		// TODO Auto-generated method stub
		System.out.println("WordCountBolt prepare called.");
	}

	@Override
	public void execute(Tuple input, BasicOutputCollector collector) {
		// TODO Auto-generated method stub
		System.out.println("WordCountBolt execute called.");
		String word=input.getString(0);
		int wordCount=0;
		if(count.containsKey(word)){
			wordCount=count.get(word).intValue();
		}
		wordCount++;
		count.put(word, wordCount);
		collector.emit(new Values(word,wordCount));
	}

	@Override
	public void cleanup() {
		// TODO Auto-generated method stub
		System.out.println("WordCountBolt cleanup called.");
	}

}

3.拓扑Topology

 3.1一个topology结构是由Spout,Bolt 组成的。Spout与Bolt之间是通过流分组连接起来的

 3.2一个topoloy task启动之后是会一一直在运行,除非杀掉

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;

public class WordTopology {

	public static void main(String[] args) {
		System.out.println("start word topology.");
		TopologyBuilder builder=new TopologyBuilder();
		builder.setSpout("word-spout", new WordSpout());
		builder.setBolt("word-split", new WordSplitBolt()).shuffleGrouping("word-spout");
		builder.setBolt("word-count",new WordCountBolt()).shuffleGrouping("word-split");
		
		Config conf=new Config();
		conf.put("paramContext", "cust");//可上下文传递参数 
		conf.setDebug(true);
		
		LocalCluster cluster=new LocalCluster(); //本地模式测试
		cluster.submitTopology("word-count", conf, builder.createTopology());
		System.out.println("finished word topology.");
	}

}

4.Topology运行流程与方法调用过程

5.


猜你喜欢

转载自blog.csdn.net/seanme/article/details/79766599