1、storm的特点
storm是一个免费、开源、分布式、实时计算系统。吞吐量高,每秒每节点能达到百万元组。
storm是跨语言、可伸缩的,具有低延迟(秒级/分钟级)、容错的特点。
storm与hadoop的对比
storm hadoop
---------------------------------------------
实时流处理 批处理
无状态 有状态
使用zk协同的主从架构 无zk的主从架构。
每秒处理数万消息 HDFS MR数分钟、数小时
不会主动停止 终有完成的时候。
2、storm的架构
2.1 核心概念
1.Tuple
主要的数据结构,有序元素的列表。
2.Stream
Tuple的序列。
3.Spouts
数据流源头。可以读取kafka队列消息。可以自定义。
4.Bolts
转接头.
逻辑处理单元。spout的数据传递个bolt,bolt计算,完成后产生新的数据。IBolt是接口。
5.Topology
Spout + bolt连接在一起形成一个top,形成有向图,定点就是计算,边是数据流。
6.task
Bolt中每个Spout或者bolt都是一个task.
组件
-------------------------------------------
Spout //龙头
Bolt //转接头,逻辑计算单元
tuple //元组,数据的基本单位。
stream //一序列tuple.
task //执行spout 或者bolt
executor //程序执行官.
topolgy //拓扑,等价于mapreduce.
zookeeper //
2.2 storm集群
1.Nimbus(灵气)
master节点。
核心组件,运行top。
分析top并收集运行task。分发task给supervisor.
监控top。
无状态,依靠zk监控top的运行状况。
2.Supervisor(监察)
每个supervisor有n个worker进程,负责代理task给worker。
worker在孵化执行线程最终运行task。storm使用内部消息系统在nimbus和supervisor之间进行通信。接受nimbus指令,管理worker进程完成task派发。
3.worker
执行特定的task,worker本身不执行任务,而是孵化executors,让executors执行task。
4.Executor
本质上由worker进程孵化出来的一个线程而已。executor运行task都属于同一spout或者bolt。
5.task
执行实际上的任务处理。或者是Spout或者是bolt.
进程
----------------------------------------
nimbus //相当于master
supervisor //相当于slave
worker //工作进程
core //ui
logviewer //日志
2.3 storm工作流程
1.nimbus等待提交的top
2.提交top后,nimbus收集task,
3.nimbus分发task给所有可用的supervisor
4.supervisor周期性发送心跳给nimbus表示自己还活着。
5.如果supervisor挂掉,不会发送心跳给nimubs,nimbus将task发送给其他的supervisor
6.nimubs挂掉,super会继续执行自己task。
7.task完成后,supervisor等待新的task
8.同时,挂掉的nimbus可以通过监控工具软件自动重启。
3、安装storm集群
[s201 ~ s204]
1.jdk
2.tar
3.环境变量
4.验证安装
$>source /etc/profile
$>./storm version
5.分发安装文件到其他节点。
6.配置
[storm/conf/storm.yaml]
storm.local.dir: "/home/centos/storm"
storm.zookeeper.servers:
- "s202"
- "s203"
storm.zookeeper.port: 2181
### nimbus.* configs are for the master
nimbus.seeds : ["s201"]
### ui.* configs are for the master
ui.host: 0.0.0.0
ui.port: 8080
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
7.分发
8.启动进程
a)启动s201 nimbus进程
$>storm nimbus &
b)启动s202 ~ s204 supervisor进程
$>storm supervisor &
c)启动s201的ui进程
$>storm ui &
9.通过webui查看
http://s201:8080/
4、案例应用
4.1 CallLog日志统计
0.配置pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.ctgu</groupId>
<artifactId>StormDemo</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.0.3</version>
</dependency>
</dependencies>
</project>
1.创建Spout
package cn.ctgu.stormdemo;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Spout类,负责产生数据流
*/
public class CallLogSpout implements IRichSpout{
//Spout输出收集器
private SpoutOutputCollector collector;
//是否完成
private boolean completed = false;
//上下文
private TopologyContext context;
//随机发生器
private Random randomGenerator = new Random();
//
private Integer idx = 0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context;
this.collector = collector;
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
/**
* 下一个元组
*/
public void nextTuple() {
if (this.idx <= 1000) {
List<String> mobileNumbers = new ArrayList<String>();
mobileNumbers.add("1234123401");
mobileNumbers.add("1234123402");
mobileNumbers.add("1234123403");
mobileNumbers.add("1234123404");
Integer localIdx = 0;
while (localIdx++ < 100 && this.idx++ < 1000) {
//取出主叫
String caller = mobileNumbers.get(randomGenerator.nextInt(4));
//取出被叫
String callee = mobileNumbers.get(randomGenerator.nextInt(4));
while (caller == callee) {
//重新取出被叫
callee = mobileNumbers.get(randomGenerator.nextInt(4));
}
//模拟通话时长
Integer duration = randomGenerator.nextInt(60);
//输出元组
this.collector.emit(new Values(caller, callee, duration));
}
}
}
public void ack(Object msgId) {
}
public void fail(Object msgId) {
}
/**
* 定义输出的字段名称
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("from", "to", "duration"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
2.创建CreatorBolt
package cn.ctgu.stormdemo;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
* 创建CallLog日志的Bolt
*/
public class CallLogCreatorBolt implements IRichBolt {
//
private OutputCollector collector;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector = collector ;
}
public void execute(Tuple tuple) {
//处理通话记录
String from = tuple.getString(0);
String to = tuple.getString(1);
Integer duration = tuple.getInteger(2);
//产生新的tuple
collector.emit(new Values(from + " - " + to, duration));
}
public void cleanup() {
}
/**
* 设置输出字段的名称
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("call", "duration"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
3.创建CounterBolt
package cn.ctgu.stormdemo;
import org.apache.storm.task.IBolt;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import java.util.HashMap;
import java.util.Map;
/**
* 通话记录计数器Bolt
*/
public class CallLogCounterBolt implements IRichBolt{
Map<String, Integer> counterMap;
private OutputCollector collector;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.counterMap = new HashMap<String, Integer>();
this.collector = collector;
}
public void execute(Tuple tuple) {
String call = tuple.getString(0);
Integer duration = tuple.getInteger(1);
if (!counterMap.containsKey(call)) {
counterMap.put(call, 1);
} else {
Integer c = counterMap.get(call) + 1;
counterMap.put(call, c);
}
collector.ack(tuple);//反馈execute执行成功
}
public void cleanup() {
for (Map.Entry<String, Integer> entry : counterMap.entrySet()) {
System.out.println(entry.getKey() + " : " + entry.getValue());
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("call"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
4.App
package cn.ctgu.stormdemo;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
/**
* App
*/
public class App {
public static void main(String[] args) throws InterruptedException {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("spout", new CallLogSpout());
//设置creator-Bolt
builder.setBolt("creator-bolt", new CallLogCreatorBolt()).shuffleGrouping("spout");
//设置counter-Bolt
builder.setBolt("counter-bolt", new CallLogCounterBolt()).fieldsGrouping("creator-bolt", new Fields("call"));
Config conf = new Config();
conf.setDebug(true);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("LogAnalyserStorm", conf, builder.createTopology());
Thread.sleep(10000);
//停止集群
cluster.shutdown();
}
}
5.在生产环境的集群上部署storm top
a)修改提交方式
[App.java]
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("spout", new CallLogSpout());
//设置creator-Bolt
builder.setBolt("creator-bolt", new CallLogCreatorBolt()).shuffleGrouping("spout");
//设置counter-Bolt
builder.setBolt("counter-bolt", new CallLogCounterBolt()).fieldsGrouping("creator-bolt", new Fields("call"));
Config conf = new Config();
conf.setDebug(true);
/**
* 本地模式storm
*/
// LocalCluster cluster = new LocalCluster();
// cluster.submitTopology("LogAnalyserStorm", conf, builder.createTopology());
// Thread.sleep(10000);
//集群模式
StormSubmitter.submitTopology("mytop", conf, builder.createTopology());
}
b)导入jar包.
c)在centos上运行top
$>storm jar xxx.jar com.it18zhang.stormdemo.App
4.2 使用storm流计算实现wordcount
Util.java
package cn.ctgu.stormdemo.util;
import java.io.IOException;
import java.io.OutputStream;
import java.lang.management.ManagementFactory;
import java.net.InetAddress;
import java.net.Socket;
import java.net.UnknownHostException;
public class Util {
//主机名
public static String getHostname(){
try {
return InetAddress.getLocalHost().getHostName();
} catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
/*
*
* 返回进程pid
* */
public static String getPID(){
String info= ManagementFactory.getRuntimeMXBean().getName();
return info.split("@")[0];
}
//返回线程
public static String getTID(){
return Thread.currentThread().getName();
}
//对象信息
public static String getOID(Object obj){
String cname=obj.getClass().getSimpleName();
int hash=obj.hashCode();
return cname+"@"+hash;
}
//获取详细信息
public static String info(Object obj,String msg){
//USER-20171114SP,8488,main,TestUtil@1160264930,hello world
return getHostname()+","+getPID()+","+getTID()+","+getOID(obj)+","+msg;
}
//向远端发送sock消息
public static void sendToClient(Object obj,String msg){
try {
String info=info(obj,msg);
Socket sock=new Socket("s1",8888);
OutputStream os=sock.getOutputStream();
os.write((info+"\r\n").getBytes());
os.flush();
os.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
1.WordCountSpout
package cn.ctgu.stormdemo.wc;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Spout
*/
public class WordCountSpout implements IRichSpout{
private TopologyContext context ;
private SpoutOutputCollector collector ;
private List<String> states ;
private Random r = new Random();
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
Util.sendToClient(this,"open()",7777);
this.context = context ;
this.collector = collector ;
states = new ArrayList<String>();
states.add("hello world tom");
states.add("hello world tomas");
states.add("hello world tomasLee");
states.add("hello world tomson");
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
public void nextTuple() {
Util.sendToClient(this, "nextTuple()",7777);
String line = states.get(r.nextInt(4));
collector.emit(new Values(line));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public void ack(Object msgId) {
}
public void fail(Object msgId) {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
2.SplitBolt
package cn.ctgu.stormdemo.wc;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
Util.sendToClient(this, "prepare()",8888);
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
Util.sendToClient(this, "execute()",8888);
String line = tuple.getString(0);
String[] arr = line.split(" ");
for(String s : arr){
collector.emit(new Values(s,1));
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
3.CounterBolt
package cn.ctgu.stormdemo.wc;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Tuple;
import java.util.HashMap;
import java.util.Map;
/**
* Created by Administrator on 2017/4/1.
*/
public class CountBolt implements IRichBolt{
private Map<String,Integer> map ;
private TopologyContext context;
private OutputCollector collector;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
Util.sendToClient(this, "prepare()",9999);
this.context = context;
this.collector = collector;
map = new HashMap<String, Integer>();
}
public void execute(Tuple tuple) {
Util.sendToClient(this, "execute()",9999);
String word = tuple.getString(0);
Integer count = tuple.getInteger(1);
if(!map.containsKey(word)){
map.put(word,1);
}
else{
map.put(word,map.get(word) + count);
}
}
public void cleanup() {
for(Map.Entry<String,Integer> entry : map.entrySet()){
System.out.println(entry.getKey() + " : " + entry.getValue());
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
4.APP
package cn.ctgu.stormdemo.wc;
import cn.ctgu.stormdemo.wc.WordCountSpout;
import cn.ctgu.stormdemo.wc.SplitBolt;
import cn.ctgu.stormdemo.wc.CountBolt;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
/**
* App
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("wcspout", new WordCountSpout(),3).setNumTasks(3);
//设置creator-Bolt
builder.setBolt("split-bolt", new SplitBolt(),4).shuffleGrouping("wcspout").setNumTasks(4);
//设置counter-Bolt
builder.setBolt("counter-bolt", new CountBolt(),5).fieldsGrouping("split-bolt", new Fields("word")).setNumTasks(5);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
// LocalCluster cluster = new LocalCluster();
// cluster.submitTopology("wc", conf, builder.createTopology());
// Thread.sleep(10000);
StormSubmitter.submitTopology("wordcount", conf, builder.createTopology());
// cluster.shutdown();
}
}
设置top的并发程度和任务
配置并发度.
1.设置worker数据
conf.setNumWorkers(1);
2.设置executors个数
//设置Spout的并发暗示 (executor个数)
builder.setSpout("wcspout", new WordCountSpout(),3);
//设置bolt的并发暗示
builder.setBolt("split-bolt", new SplitBolt(),4)
3.设置task个数
每个线程可以执行多个task,线程数与cpu的内核数相对应。
task平均分配到每个executor,默认每个executor运行一个task
任务数相当于于对象的个数。
builder.setSpout("wcspout", new WordCountSpout(),3).setNumTasks(2);
//
builder.setBolt("split-bolt", new SplitBolt(),4).shuffleGrouping("wcspout").setNumTasks(3);
4.并发度 ==== 所有的task个数的总和。
5、分组策略
5.1 shuffle
随机分组.
5.2 field分组
按照指定filed的key进行hash处理,相同的field,一定进入到同一bolt。
该分组容易产生数据倾斜问题,通过使用二次聚合避免此类问题。
5.3 使用二次聚合避免倾斜
a)App入口类
[App.java]
package cn.ctgu.stormdemo.group.shuffle;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
/**
* App
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("wcspout", new WordCountSpout()).setNumTasks(2);
//设置creator-Bolt
builder.setBolt("split-bolt", new SplitBolt(),3).shuffleGrouping("wcspout").setNumTasks(3);
//设置counter-Bolt
builder.setBolt("counter-1", new CountBolt(),3).shuffleGrouping("split-bolt").setNumTasks(3);
builder.setBolt("counter-2", new CountBolt(),2).fieldsGrouping("counter-1",new Fields("word")).setNumTasks(2);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
//Thread.sleep(20000);
// StormSubmitter.submitTopology("wordcount", conf, builder.createTopology());
//cluster.shutdown();
}
}
WordCountSpout.java
package cn.ctgu.stormdemo.group.shuffle;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Spout
*/
public class WordCountSpout implements IRichSpout{
private TopologyContext context ;
private SpoutOutputCollector collector ;
private List<String> states ;
private Random r = new Random();
private int index = 0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context ;
this.collector = collector ;
states = new ArrayList<String>();
states.add("hello world tom");
states.add("hello world tomas");
states.add("hello world tomasLee");
states.add("hello world tomson");
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
public void nextTuple() {
if(index < 3){
String line = states.get(r.nextInt(4));
collector.emit(new Values(line));
//Util.sendToLocalhost(this, line);
index ++ ;
}
}
public void ack(Object msgId) {
}
public void fail(Object msgId) {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitBolt.java
package cn.ctgu.stormdemo.group.shuffle;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
String line = tuple.getString(0);
//Util.sendToLocalhost(this, line);
String[] arr = line.split(" ");
for(String s : arr){
collector.emit(new Values(s,1));
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
CountBolt.java
package cn.ctgu.stormdemo.group.shuffle;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
/**
* countbolt,使用二次聚合,解决数据倾斜问题。
* 一次聚合和二次聚合使用field分组,完成数据的最终统计。
* 一次聚合和上次split工作使用
*/
public class CountBolt implements IRichBolt{
private Map<String,Integer> map ;
private TopologyContext context;
private OutputCollector collector;
private long lastEmitTime = 0 ;
private long duration = 5000 ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context;
this.collector = collector;
map = new HashMap<String, Integer>();
map = Collections.synchronizedMap(map);//synchronizedMap将线程不安全的编程线程安全的
//分线程,执行清分工作,线程安全问题
Thread t = new Thread(){
public void run() {
while(true){
emitData();
}
}
};
//守护进程,线程t为守护进程
t.setDaemon(true);
t.start();
}
private void emitData(){
//清分map
synchronized (map){
//判断是否符合清分的条件
for (Map.Entry<String, Integer> entry : map.entrySet()) {
//向下一环节发送数据
collector.emit(new Values(entry.getKey(), entry.getValue()));
}
//清空map
map.clear();
}
//休眠
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public void execute(Tuple tuple) {
//提取单词
String word = tuple.getString(0);
Util.sendToLocalhost(this, word);
//提取单词个数
Integer count = tuple.getInteger(1);
if(!map.containsKey(word)){
map.put(word, count);
}
else{
map.put(word,map.get(word) + count);
}
}
public void cleanup() {
for(Map.Entry<String,Integer> entry : map.entrySet()){
System.out.println(entry.getKey() + " : " + entry.getValue());
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
5.4 all分组
使用广播分组。
builder.setBolt("split-bolt", new SplitBolt(),2).allGrouping("wcspout").setNumTasks(2);
App.java
package cn.ctgu.stormdemo.group.all;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
/**
* App
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("wcspout", new WordCountSpout()).setNumTasks(2);
//设置creator-Bolt
builder.setBolt("split-bolt", new SplitBolt(),2).allGrouping("wcspout").setNumTasks(2);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
System.out.println("hello world llll");
//Thread.sleep(20000);
// StormSubmitter.submitTopology("wordcount", conf, builder.createTopology());
//cluster.shutdown();
}
}
WordCountSpout.java
package cn.ctgu.stormdemo.group.all;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Spout
*/
public class WordCountSpout implements IRichSpout{
private TopologyContext context ;
private SpoutOutputCollector collector ;
private List<String> states ;
private Random r = new Random();
private int index = 0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context ;
this.collector = collector ;
states = new ArrayList<String>();
states.add("hello world tom");
states.add("hello world tomas");
states.add("hello world tomasLee");
states.add("hello world tomson");
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
public void nextTuple() {
if(index < 3){
String line = states.get(r.nextInt(4));
collector.emit(new Values(line));
//Util.sendToLocalhost(this, line);
index ++ ;
}
}
public void ack(Object msgId) {
}
public void fail(Object msgId) {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitBolt.java
package cn.ctgu.stormdemo.group.all;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
String line = tuple.getString(0);
Util.sendToLocalhost(this,line);
String[] arr = line.split(" ");
for(String s : arr){
collector.emit(new Values(s,1));
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
5.5 direct(特供)
只发送给指定的一个bolt.
//a.通过emitDirect()方法发送元组
//可以通过context.getTaskToComponent()方法得到所有taskId和组件名的映射
collector.emitDirect(taskId,new Values(line));
//b.指定directGrouping方式。
builder.setBolt("split-bolt", new SplitBolt(),2).directGrouping("wcspout").setNumTasks(2);
App.java
package cn.ctgu.stormdemo.group.direct;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.topology.TopologyBuilder;
/**
* App
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("wcspout", new WordCountSpout()).setNumTasks(2);
//设置creator-Bolt
builder.setBolt("split-bolt", new SplitBolt(),2).directGrouping("wcspout").setNumTasks(2);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
//Thread.sleep(20000);
// StormSubmitter.submitTopology("wordcount", conf, builder.createTopology());
//cluster.shutdown();
}
}
WordCountSpout.java
package cn.ctgu.stormdemo.group.direct;
import org.apache.storm.generated.Grouping;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Spout
*/
public class WordCountSpout implements IRichSpout{
private TopologyContext context ;
private SpoutOutputCollector collector ;
private List<String> states ;
private Random r = new Random();
private int index = 0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context ;
this.collector = collector ;
states = new ArrayList<String>();
states.add("hello world tom");
states.add("hello world tomas");
states.add("hello world tomasLee");
states.add("hello world tomson");
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
public void nextTuple() {
int taskId = 0 ;
Map<Integer, String> map = context.getTaskToComponent() ;
for(Map.Entry<Integer,String> e : map.entrySet()){
if(e.getValue().equals("split-bolt")){
taskId = e.getKey() ;
break ;
}
}
if(index < 3){
String line = states.get(r.nextInt(4));
collector.emitDirect(taskId,new Values(line));
//Util.sendToLocalhost(this, line);
index ++ ;
}
}
public void ack(Object msgId) {
}
public void fail(Object msgId) {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitBolt.java
package cn.ctgu.stormdemo.group.direct;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
String line = tuple.getString(0);
Util.sendToLocalhost(this,line);
String[] arr = line.split(" ");
for(String s : arr){
collector.emit(new Values(s,1));
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
5.6 global分组
对目标target tasked进行排序,选择最小的taskId号进行发送tuple
类似于direct,可以是特殊的direct分组。
App.java
package cn.ctgu.stormdemo.group.global;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.topology.TopologyBuilder;
/**
* global是direct分组的特例,选择最小的taskId
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("wcspout", new WordCountSpout()).setNumTasks(2);
//设置creator-Bolt
builder.setBolt("split-bolt", new SplitBolt(),2).noneGrouping("wcspout").setNumTasks(2);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
System.out.println("hello world");
//Thread.sleep(20000);
// StormSubmitter.submitTopology("wordcount", conf, builder.createTopology());
//cluster.shutdown();
}
}
WordCountSpout.java
package cn.ctgu.stormdemo.group.global;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Spout
*/
public class WordCountSpout implements IRichSpout{
private TopologyContext context ;
private SpoutOutputCollector collector ;
private List<String> states ;
private Random r = new Random();
private int index = 0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context ;
this.collector = collector ;
states = new ArrayList<String>();
states.add("hello world tom");
states.add("hello world tomas");
states.add("hello world tomasLee");
states.add("hello world tomson");
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
public void nextTuple() {
if(index < 3){
String line = states.get(r.nextInt(4));
collector.emit(new Values(line),index);
index ++ ;
}
}
public void ack(Object msgId) {
}
public void fail(Object msgId) {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitBolt.java
package cn.ctgu.stormdemo.group.global;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
String line = tuple.getString(0);
String[] arr = line.split(" ");
for(String s : arr){
collector.emit(new Values(s,1));
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
5.7 自定义分组
App.java
package cn.ctgu.stormdemo.group.custom;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.topology.TopologyBuilder;
/**
* 自定义分组
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("wcspout", new WordCountSpout()).setNumTasks(2);
//设置creator-Bolt
builder.setBolt("split-bolt", new SplitBolt(),4).customGrouping("wcspout",new MyGrouping()).setNumTasks(4);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
System.out.println("hello world");
}
}
WordCountSpout.java
package cn.ctgu.stormdemo.group.custom;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Spout
*/
public class WordCountSpout implements IRichSpout{
private TopologyContext context ;
private SpoutOutputCollector collector ;
private List<String> states ;
private Random r = new Random();
private int index = 0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context ;
this.collector = collector ;
states = new ArrayList<String>();
states.add("hello world tom");
states.add("hello world tomas");
states.add("hello world tomasLee");
states.add("hello world tomson");
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
public void nextTuple() {
if(index < 3){
String line = states.get(r.nextInt(4));
collector.emit(new Values(line),index);
index ++ ;
}
}
public void ack(Object msgId) {
}
public void fail(Object msgId) {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitBolt.java
package cn.ctgu.stormdemo.group.custom;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
System.out.println(this + " : prepare()");
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
String line = tuple.getString(0);
System.out.println(this + " : excute() : " + line);
String[] arr = line.split(" ");
for(String s : arr){
collector.emit(new Values(s,1));
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
MyGrouping.java
package cn.ctgu.stormdemo.group.custom;
import org.apache.storm.generated.GlobalStreamId;
import org.apache.storm.grouping.CustomStreamGrouping;
import org.apache.storm.task.WorkerTopologyContext;
import java.util.ArrayList;
import java.util.List;
/**
* 自定义分组
*/
public class MyGrouping implements CustomStreamGrouping {
//接受目标任务的id集合
private List<Integer> targetTasks ;
public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) {
this.targetTasks = targetTasks ;
}
public List<Integer> chooseTasks(int taskId, List<Object> values) {
List<Integer> subTaskIds = new ArrayList<Integer>();
for(int i = 0 ; i <= targetTasks.size() / 2 ; i ++){
subTaskIds.add(targetTasks.get(i));
}
return subTaskIds;
}
}
6、storm确保消息如何被完成处理
1.发送的tuple需要携带msgId
collector.emit(new Values(line),index);
2.bolt中需要对tuple进行确认(ack() | fail())
public void execute(Tuple tuple) {
String line = tuple.getString(0);
System.out.println(this + " : " + line);
if(new Random().nextBoolean()){
//确认
collector.ack(tuple);
}
else{
//失败
collector.fail(tuple);
}
}
3.实现spout的ack()和fail()方法
public void ack(Object msgId) {
System.out.println(this + " : ack() : " + msgId);
}
public void fail(Object msgId) {
System.out.println(this + " : fail() : " + msgId);
}
完整代码
App.java
package cn.ctgu.stormdemo.ensure;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.topology.TopologyBuilder;
/**
* App
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//设置Spout
builder.setSpout("wcspout", new WordCountSpout()).setNumTasks(1);
//设置creator-Bolt
builder.setBolt("split-bolt", new SplitBolt(),2).shuffleGrouping("wcspout").setNumTasks(2);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
System.out.println("hello world llll");
}
}
WordCountSpout.java
package com.it18zhang.stormdemo.ensure;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.*;
/**
* Spout
*/
public class WordCountSpout implements IRichSpout{
private TopologyContext context ;
private SpoutOutputCollector collector ;
private List<String> states ;
private Random r = new Random();
private int index = 0;
//消息集合, 存放所有消息
private Map<Long,String> messages = new HashMap<Long, String>();
//失败消息
private Map<Long,Integer> failMessages = new HashMap<Long, Integer>();
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.context = context ;
this.collector = collector ;
states = new ArrayList<String>();
states.add("hello world tom");
states.add("hello world tomas");
states.add("hello world tomasLee");
states.add("hello world tomson");
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
public void nextTuple() {
if(index < 3){
String line = states.get(r.nextInt(4));
//取出时间戳
long ts = System.currentTimeMillis() ;
messages.put(ts,line);
//发送元组,使用ts作为消息id
collector.emit(new Values(line),ts);
System.out.println(this + "nextTuple() : " + line + " : " + ts);
index ++ ;
}
}
/**
* 回调处理
*/
public void ack(Object msgId) {
//成功处理,删除失败重试.
Long ts = (Long)msgId ;
failMessages.remove(ts) ;
messages.remove(ts) ;
}
public void fail(Object msgId) {
//时间戳作为msgId
Long ts = (Long)msgId;
//判断消息是否重试了3次
Integer retryCount = failMessages.get(ts);
retryCount = (retryCount == null ? 0 : retryCount) ;
//超过最大重试次数
if(retryCount >= 3){
failMessages.remove(ts) ;
messages.remove(ts) ;
}
else{
//重试
collector.emit(new Values(messages.get(ts)),ts);
System.out.println(this + "fail() : " + messages.get(ts) + " : " + ts);
retryCount ++ ;
failMessages.put(ts,retryCount);
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitBolt.java
package cn.ctgu.stormdemo.ensure;
import cn.ctgu.stormdemo.util.Util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
import java.util.Random;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
String line = tuple.getString(0);
if(new Random().nextBoolean()){
//确认
collector.ack(tuple);
System.out.println(this + " : ack() : " + line + " : "+ tuple.getMessageId().toString());
}
else{
//失败
collector.fail(tuple);
System.out.println(this + " : fail() : " + line + " : " + tuple.getMessageId().toString());
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
7、storm与其他组件的整合
7.1 Kafka+Storm
1.描述·
storm以消费者从kafka队列中提取消息。
2.添加storm-kafka依赖项[pom.xml]
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.ctgu</groupId>
<artifactId>StormDemo</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>1.0.2</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.1.1</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
3.启动kafka + storm集群.
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//zk连接串
String zkConnString = "s202:2181" ;
//
BrokerHosts hosts = new ZkHosts(zkConnString);
//Spout配置
SpoutConfig spoutConfig = new SpoutConfig(hosts, "test2", "/test2", UUID.randomUUID().toString());
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
builder.setSpout("kafkaspout", kafkaSpout).setNumTasks(2);
builder.setBolt("split-bolt", new SplitBolt(),2).shuffleGrouping("kafkaspout").setNumTasks(2);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
}
}
4.启动kafka集群和storm,使用生产者发送消息给kafka。
5.看storm是否消费到。
完整测试代码
App.java
package cn.ctgu.stormdemo.kafka;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.kafka.*;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;
import java.util.UUID;
/**
* App
*/
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
//zk连接串
String zkConnString = "s202:2181" ;
//
BrokerHosts hosts = new ZkHosts(zkConnString);
//Spout配置
SpoutConfig spoutConfig = new SpoutConfig(hosts, "test2", "/test2", UUID.randomUUID().toString());
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
builder.setSpout("kafkaspout", kafkaSpout).setNumTasks(2);
builder.setBolt("split-bolt", new SplitBolt(),2).shuffleGrouping("kafkaspout").setNumTasks(2);
Config conf = new Config();
conf.setNumWorkers(2);
conf.setDebug(true);
/**
* 本地模式storm
*/
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
}
}
SplitBolt.java
package cn.ctgu.stormdemo.kafka;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
/**
*/
public class SplitBolt implements IRichBolt {
private TopologyContext context ;
private OutputCollector collector ;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.context = context ;
this.collector = collector ;
}
public void execute(Tuple tuple) {
String line = tuple.getString(0) ;
System.out.println(line);
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
7.2 storm整合hbase
1.描述
将计算结果写入到hbase数据库中。
hbase 高吞吐量、随机定位、实时读写。
2.创建hbase wordcount表,f1
$>hbase shell
$hbase shell>create 'ns1:wordcount' , 'f1'
3.添加pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.ctgu</groupId>
<artifactId>StormDemo</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>1.0.2</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.1.1</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hbase</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
</dependencies>
</project>
4.HbaseBolt
package cn.ctgu.stormdemo.hbase;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.hbase.bolt.HBaseBolt;
import org.apache.storm.hbase.bolt.mapper.SimpleHBaseMapper;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
/**
* App
*/
public class App {
private static final String WORD_SPOUT = "WORD_SPOUT";
private static final String COUNT_BOLT = "COUNT_BOLT";
private static final String HBASE_BOLT = "HBASE_BOLT";
public static void main(String[] args) throws Exception {
//hbase映射
SimpleHBaseMapper mapper = new SimpleHBaseMapper()
.withRowKeyField("word") //rowkey
.withColumnFields(new Fields("word")) //column
.withCounterFields(new Fields("count")) //column
.withColumnFamily("f1"); //列族
HBaseBolt hbaseBolt = new HBaseBolt("ns1:wordcount", mapper);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("wcspout", new WordCountSpout()).setNumTasks(1);
builder.setBolt("split-bolt", new SplitBolt(),2).shuffleGrouping("wcspout").setNumTasks(2);
builder.setBolt("hbase-bolt", hbaseBolt,2).fieldsGrouping("split-bolt",new Fields("word")).setNumTasks(2);
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wc", conf, builder.createTopology());
}
}
5.复制配置hbase配置文件到resources下
[resources]
hbase-site.xml
hdfs-site.xml
6.执行
启动hbase集群 + storm。
7.查看hbase表数据
$hbase>get_counter 'ns1:wordcount' , 'word' , 'f1:count'
完整代码
App.java
package cn.ctgu.stormdemo.hbase;
import cn.ctgu.stormdemo.group.wc.shuffle.CountBolt;
import org.apache.storm.Config;
import org.apache.storm.StormSubmitter;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
public class App {
public static void main(String[] args) throws Exception {
TopologyBuilder builder=new TopologyBuilder();
//设置spout
builder.setSpout("wcspout",new WordCountSpout());
//设置create-Bolt
builder.setBolt("split-bolt",new SplitBolt(),2).shuffleGrouping("wcspout").setNumTasks(2);
//设置HbaseBolt
builder.setBolt("counter-bolt",new HbaseBolt(),2).shuffleGrouping("split-bolt").setNumTasks(2);
Config conf=new Config();
conf.setDebug(true);
/* //本地模式storm
LocalCluster cluster=new LocalCluster();
cluster.submitTopology("wc",conf,builder.createTopology());
Thread.sleep(10000);
//停止集群,因为storm没有停止的时候,只有手动停止集群才会执行Bolt中的cleanup
cluster.shutdown();*/
StormSubmitter.submitTopology("wordcount",conf,builder.createTopology());
}
}
WordCountSpout.java
package cn.ctgu.stormdemo.hbase;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
public class WordCountSpout implements IRichSpout{
private TopologyContext context;
private SpoutOutputCollector collector;
private List<String> stats;
private Random r=new Random();
private int index=0;
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
// Util.sendToClient(this,"open()");
this.context=context;
this.collector=collector;
//插入数据
stats=new ArrayList<String>();
stats.add("hello world tom");
stats.add("hello world tomas");
stats.add("hello world tomasLee");
stats.add("hello world tomson");
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public void close() {
}
public void activate() {
}
public void deactivate() {
}
//发送随机行数据
public void nextTuple() {
if(index<5){
String line=stats.get(r.nextInt(4));
collector.emit(new Values(line));
System.out.println(this+":"+line);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
index++;
}
}
public void ack(Object o) {
}
public void fail(Object o) {
}
//声明输出字段
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("lines"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
SplitBolt.java
package cn.ctgu.stormdemo.hbase;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
public class SplitBolt implements IRichBolt{
private TopologyContext context;
private OutputCollector collector;
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.context=context;
this.collector=collector;
}
//对从spout接收过来的数据进行相应的处理
public void execute(Tuple tuple) {
// Util.sendToClient(this,"execute()");
String line=tuple.getString(0);
System.out.println(this+":"+line);
String[]arr=line.split(" ");//按空格切割行数据
for(String s:arr){
collector.emit(new Values(s,1));//往外发射
}
}
public void cleanup() {
}
//这个是对发射出去的数据的字段进行声明,这样才能被下一个bolt接收到
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
HbaseBolt.java
package cn.ctgu.stormdemo.hbase;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Tuple;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.util.Map;
//HbaseBolt,写入数据到Hbase数据库中
public class HbaseBolt implements IRichBolt{
private Table t;
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
try {
Configuration conf= HBaseConfiguration.create();
Connection conn= ConnectionFactory.createConnection(conf);
TableName tname=TableName.valueOf("ns1:wordcount");
t=conn.getTable(tname);
} catch (Exception e) {
e.printStackTrace();
}
}
public void execute(Tuple tuple) {
String word=tuple.getString(0);
Integer count=tuple.getInteger(1);
//使用hbase的increment 机制进行wordcount进行单词统计
byte[]rowkey= Bytes.toBytes(word);
byte[]f=Bytes.toBytes("f1");
byte[]c=Bytes.toBytes("count");
try {
t.incrementColumnValue(rowkey,f,c,count);
} catch (Exception e) {
e.printStackTrace();
}
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
}