个性化视频推荐系统
1. 个性化推荐与应用
2. 视频推荐系统
3. 推荐系统简介
:后台系统每天做一次数据的全量处理,统计分析用户以往的观看记录,将统计后的结果作为推荐的依据,然后将视频个性化的推荐给用户,提高用户观看视频的可能性。
存在的问题:
1.
单机部署
2.
处理数据量有限,不能扩展
3.
无法做到实时性
Storm是什么?
1. Storm是Twitter开源的一个分布式的实时计算系统
2. 使用场景:数据的实时分析,持续计算,分布式RPC等等。
Storm优点:
1.
分布式
2.
可扩展
3.
高可靠性
4.
编程模型简单
5.
高效实时
下载Storm安装包
Storm安装
环境:centos6.4
软件:
jzmq-master-----java与c++通讯的桥梁,有了它,就可以使用zeromp了
storm-0.8.2
zeromq-2.1.7-----号称史上最牛逼的消息队列(用c++写的)
zookeeper-3.4.5
1.编译安装ZMQ:
tar -xzf zeromq-2.1.7.tar.gz
cd zeromq-2.1.7
./configure
#编译可能会出错:configure: error: Unable to find a working C++ compiler
#安装一下依赖的rpm包:libstdc++-devel gcc-c++
可以上网的情况下:
yum install gcc-c++
虚拟机不能上网情况:首先到http://mirrors.163.com/centos/6.4/os/x86_64/Packages/ 下载rpm
rpm -i libstdc++-devel-4.4.7-3.el6.x86_64.rpm
rpm -i gcc-c++-4.4.7-3.el6.x86_64.rpm
rpm -i libuuid-devel-2.17.2-12.9.el6.x86_64.rpm
./configure
make
make install
2.编译安装JZMQ:
cd jzmq
./autogen.sh
#报错:autogen.sh: error: could not find libtool. libtool is required to run autogen.sh. 缺少libtool
yum install libtool
或者手动安装
rpm -i autoconf-2.63-5.1.el6.noarch.rpm
rpm -i automake-1.11.1-4.el6.noarch.rpm
rpm -i libtool-2.2.6-15.5.el6.x86_64.rpm
./configure
make
make install
3.编译安装Python(storm的启动配置文件是用python写的)
tar –zxvf Python-2.6.6.tgz
cd Python-2.6.6
./configure
make
make install
下载并解压Storm发布版本
下一步,需要在Nimbus和Supervisor机器上安装Storm发行版本。
1. 下载Storm发行版本
wget https://dl.dropbox.com/u/133901206/storm-0.8.2.zip
2. 解压到安装目录下:
unzip storm-0.8.1.zip
修改storm.yaml配置文件
Storm发行版本解压目录下有一个conf/storm.yaml文件,用于配置Storm。默认配置在这里可以查看。conf/storm.yaml中的配置选项将覆盖defaults.yaml中的默认配置。
以下配置选项是必须在conf/storm.yaml中进行配置的:
1) storm.zookeeper.servers: Storm集群使用的Zookeeper集群地址,其格式如下:
storm.zookeeper.servers:
- "111.222.333.444"
- "555.666.777.888"
如果Zookeeper集群使用的不是默认端口,那么还需要storm.zookeeper.port选项。
2) storm.local.dir: Nimbus和Supervisor进程用于存储少量状态,如jars、confs等的本地磁盘目录,需要提前创建该目录并给以足够的访问权限。
然后在storm.yaml中配置该目录,如:
storm.local.dir: "/usr/storm/workdir"
3) java.library.path: Storm使用的本地库(ZMQ和JZMQ)加载路径,默认为"/usr/local/lib:/opt/local/lib:/usr/lib",
一般来说ZMQ和JZMQ默认安装在/usr/local/lib 下,因此不需要配置即可。
4) nimbus.host: Storm集群Nimbus机器地址,各个Supervisor工作节点需要知道哪个机器是Nimbus,以便下载Topologies的jars、confs等文件,如:
nimbus.host: "111.222.333.444"
5) supervisor.slots.ports: 对于每个Supervisor工作节点,需要配置该工作节点可以运行的worker数量。每个worker占用一个单独的端口用于接收消息,
该配置选项即用于定义哪些端口是可被worker使用的。默认情况下,每个节点上可运行4个workers,分别在6700、6701、6702和6703端口,如:
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
2.5 启动Storm各个后台进程
最后一步,启动Storm的所有后台进程。和Zookeeper一样,Storm也是快速失败(fail-fast)的系统,这样Storm才能在任意时刻被停止,并且当进程重启后被正确地恢复执行。这也是为什么Storm不在进程内保存状态的原因,即使Nimbus或Supervisors被重启,运行中的Topologies不会受到影响。
以下是启动Storm各个后台进程的方式:
Nimbus: 在Storm主控节点上运行"bin/storm nimbus >/dev/null 2>&1 &"启动Nimbus后台程序,并放到后台执行;
Supervisor: 在Storm各个工作节点上运行"bin/storm supervisor >/dev/null 2>&1 &"启动Supervisor后台程序,并放到后台执行;
UI: 在Storm主控节点上运行"bin/storm ui >/dev/null 2>&1 &"启动UI后台程序,并放到后台执行,启动后可以通过http://{nimbus host}:8080观察集群的worker资源使用情况、Topologies的运行状态等信息。
注意事项:
Storm后台进程被启动后,将在Storm安装部署目录下的logs/子目录下生成各个进程的日志文件。
经测试,Storm UI必须和Storm Nimbus部署在同一台机器上,否则UI无法正常工作,因为UI进程会检查本机是否存在Nimbus链接。
为了方便使用,可以将bin/storm加入到系统环境变量中。
至此,Storm集群已经部署、配置完毕,可以向集群提交拓扑运行了。
3. 向集群提交任务
1)启动Storm Topology:
storm jar allmycode.jar org.me.MyTopology arg1 arg2 arg3
其中,allmycode.jar是包含Topology实现代码的jar包,org.me.MyTopology的main方法是Topology的入口,arg1、arg2和arg3为org.me.MyTopology执行时需要传入的参数。
2)停止Storm Topology:
storm kill {toponame}
其中,{toponame}为Topology提交到Storm集群时指定的Topology任务名称。
storm的实例
Storm的WordCount例子
作用:统计文本中每个单词出现的次数
例如a.txt中的内容是:
storm hive hadoop storm
hadoop storm
统计结果:
hadoop :2
hive :1
storm :3
数据过滤例子
1. Topology
WordCountTopo.java
package cn.newbies.storm.topology;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;
import cn.newbies.storm.bolt.WordCounter;
import cn.newbies.storm.bolt.WordSpliter;
import cn.newbies.storm.spout.WordReader;
public class WordCountTopo {
/**
* Storm word count demo
*
* @param args
*/
public static void main(String[] args) {
if (args.length != 2) {
System.err.println("Usage: inputPaht timeOffset");
System.err.println("such as : java -jar WordCount.jar D://input/ 2");
System.exit(2);
}
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader", new WordReader());
builder.setBolt("word-spilter", new WordSpliter()).shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter()).shuffleGrouping("word-spilter");
String inputPaht = args[0];
String timeOffset = args[1];
Config conf = new Config();
conf.put("INPUT_PATH", inputPaht);
conf.put("TIME_OFFSET", timeOffset);
conf.setDebug(false);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("WordCount", conf, builder.createTopology());
}
}
FileWriterTopo.java
package cn.newbies.storm.topology;
import backtype.storm.Config;
import backtype.storm.StormSubmitter;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import cn.newbies.storm.bolt.FieldsGroupingBolt;
import cn.newbies.storm.bolt.WordCounter;
import cn.newbies.storm.bolt.WriterBolt;
import cn.newbies.storm.spout.MetaSpout;
import cn.newbies.storm.spout.StringScheme;
import cn.newbies.storm.utils.PropertyUtil;
import com.taobao.metamorphosis.client.MetaClientConfig;
import com.taobao.metamorphosis.client.consumer.ConsumerConfig;
import com.taobao.metamorphosis.utils.ZkUtils.ZKConfig;
public class FileWriterTopo {
/**
* 数据写入文本的Topo
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: topic group");
System.exit(2);
}
String topic = args[0];
String group = args[1];
// zkConnect=master:2181
// zkSessionTimeoutMs=30000
// zkConnectionTimeoutMs=30000
// zkSyncTimeMs=5000
//
// scheme=date,id,content
// separator=,
// target=date
// 设置ZK配置信息
final ZKConfig zkConfig = new ZKConfig();
zkConfig.zkConnect = PropertyUtil.getProperty("zkConnect");
zkConfig.zkSessionTimeoutMs = Integer.parseInt(PropertyUtil.getProperty("zkSessionTimeoutMs"));
zkConfig.zkConnectionTimeoutMs = Integer.parseInt(PropertyUtil.getProperty("zkConnectionTimeoutMs"));
zkConfig.zkSyncTimeMs = Integer.parseInt(PropertyUtil.getProperty("zkSyncTimeMs"));
// 设置MetaQ
final MetaClientConfig metaClientConfig = new MetaClientConfig();
metaClientConfig.setZkConfig(zkConfig);
ConsumerConfig consumerConfig = new ConsumerConfig(group);
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setNumWorkers(2);
builder.setSpout("meta-spout", new MetaSpout(metaClientConfig, topic, consumerConfig, new StringScheme()));
builder.setBolt("field-grouping-bolt", new FieldsGroupingBolt(), 2).shuffleGrouping("meta-spout");
builder.setBolt("file-writer-bolt", new WriterBolt(), 4).fieldsGrouping("field-grouping-bolt", new Fields("partition"));
StormSubmitter.submitTopology("fileWriter", conf, builder.createTopology());
}
}
2. utils
PropertyUtil.java
package cn.newbies.storm.utils;
import java.io.InputStream;
import java.util.Properties;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
/**
* 属性配置读取工具
*/
public class PropertyUtil {
private static final Log log = LogFactory.getLog(PropertyUtil.class);
private static Properties pros = new Properties();
// 加载属性文件
static {
try {
InputStream in = PropertyUtil.class.getClassLoader().getResourceAsStream("config.properties");
pros.load(in);
} catch (Exception e) {
log.error("load configuration error", e);
}
}
/**
* 读取配置文中的属性值
* @param key
* @return
*/
public static String getProperty(String key) {
return pros.getProperty(key);
}
}
3. spout
MetaMessageWrapper.java
/*
* (C) 2007-2012 Alibaba Group Holding Limited.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
* Authors:
* wuhua <[email protected]> , boyan <[email protected]>
*/
package cn.newbies.storm.spout;
import java.util.concurrent.CountDownLatch;
import com.taobao.metamorphosis.Message;
/**
* Meta消息的包装类,关联一个CountDownLatch
*
* @author boyan([email protected])
* @date 2011-11-8
*
*/
public final class MetaMessageWrapper {
public final Message message;
public final CountDownLatch latch;
public volatile boolean success = false;
public MetaMessageWrapper(final Message message) {
super();
this.message = message;
this.latch = new CountDownLatch(1);
}
}
MetaSpout.java
/*
* (C) 2007-2012 Alibaba Group Holding Limited.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
* Authors:
* wuhua <[email protected]> , boyan <[email protected]>
*/
package cn.newbies.storm.spout;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.Executor;
import java.util.concurrent.TimeUnit;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import backtype.storm.spout.Scheme;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import com.taobao.gecko.core.util.LinkedTransferQueue;
import com.taobao.metamorphosis.Message;
import com.taobao.metamorphosis.client.MessageSessionFactory;
import com.taobao.metamorphosis.client.MetaClientConfig;
import com.taobao.metamorphosis.client.MetaMessageSessionFactory;
import com.taobao.metamorphosis.client.consumer.ConsumerConfig;
import com.taobao.metamorphosis.client.consumer.MessageConsumer;
import com.taobao.metamorphosis.client.consumer.MessageListener;
import com.taobao.metamorphosis.exception.MetaClientException;
/**
* 支持metamorphosis消息消费的storm spout
*
* @author boyan([email protected])
* @date 2011-11-8
*
*/
public class MetaSpout extends BaseRichSpout {
private static final long serialVersionUID = 4382748324382L;
public static final String FETCH_MAX_SIZE = "meta.fetch.max_size";
public static final int DEFAULT_MAX_SIZE = 1024 * 1024;
private String topic;
private transient MessageConsumer messageConsumer;
private transient MessageSessionFactory sessionFactory;
private final MetaClientConfig metaClientConfig;
private final ConsumerConfig consumerConfig;
static final Log log = LogFactory.getLog(MetaSpout.class);
private final Scheme scheme;
/**
* Time in milliseconds to wait for a message from the queue if there is no
* message ready when the topology requests a tuple (via
* {@link #nextTuple()}).
*/
public static final long WAIT_FOR_NEXT_MESSAGE = 1L;
private transient ConcurrentHashMap<Long, MetaMessageWrapper> id2wrapperMap;
private transient SpoutOutputCollector collector;
private transient LinkedTransferQueue<MetaMessageWrapper> messageQueue;
public MetaSpout(final MetaClientConfig metaClientConfig, final String topic, final ConsumerConfig consumerConfig, final Scheme scheme) {
super();
this.metaClientConfig = metaClientConfig;
this.consumerConfig = consumerConfig;
this.topic = topic;
this.scheme = scheme;
}
@Override
@SuppressWarnings("rawtypes")
public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
Integer maxSize = (Integer) conf.get(FETCH_MAX_SIZE);
if (maxSize == null) {
log.warn("Using default FETCH_MAX_SIZE");
maxSize = DEFAULT_MAX_SIZE;
}
this.id2wrapperMap = new ConcurrentHashMap<Long, MetaMessageWrapper>();
this.messageQueue = new LinkedTransferQueue<MetaMessageWrapper>();
try {
this.collector = collector;
this.setUpMeta(topic, maxSize);
} catch (final MetaClientException e) {
log.error("Setup meta consumer failed", e);
}
}
private void setUpMeta(final String topic, final Integer maxSize) throws MetaClientException {
this.sessionFactory = new MetaMessageSessionFactory(this.metaClientConfig);
this.messageConsumer = this.sessionFactory.createConsumer(this.consumerConfig);
this.messageConsumer.subscribe(topic, maxSize, new MessageListener() {
@Override
public void recieveMessages(final Message message) {
final MetaMessageWrapper wrapper = new MetaMessageWrapper(message);
MetaSpout.this.id2wrapperMap.put(message.getId(), wrapper);
MetaSpout.this.messageQueue.offer(wrapper);
try {
wrapper.latch.await();
} catch (final InterruptedException e) {
Thread.currentThread().interrupt();
}
// 消费失败,抛出运行时异常
if (!wrapper.success) {
message.setRollbackOnly();
}
}
@Override
public Executor getExecutor() {
return null;
}
}).completeSubscribe();
}
@Override
public void close() {
try {
this.messageConsumer.shutdown();
} catch (final MetaClientException e) {
log.error("Shutdown consumer failed", e);
}
try {
this.sessionFactory.shutdown();
} catch (final MetaClientException e) {
log.error("Shutdown session factory failed", e);
}
}
@Override
public void nextTuple() {
if (this.messageConsumer != null) {
try {
final MetaMessageWrapper wrapper = this.messageQueue.poll(WAIT_FOR_NEXT_MESSAGE, TimeUnit.MILLISECONDS);
if (wrapper == null) {
return;
}
final Message message = wrapper.message;
this.collector.emit(this.scheme.deserialize(message.getData()), message.getId());
} catch (final InterruptedException e) {
// interrupted while waiting for message, big deal
}
}
}
@Override
public void ack(final Object msgId) {
if (msgId instanceof Long) {
final long id = (Long) msgId;
final MetaMessageWrapper wrapper = this.id2wrapperMap.remove(id);
if (wrapper == null) {
log.warn(String.format("don't know how to ack(%s: %s)", msgId.getClass().getName(), msgId));
return;
}
wrapper.success = true;
wrapper.latch.countDown();
} else {
log.warn(String.format("don't know how to ack(%s: %s)", msgId.getClass().getName(), msgId));
}
}
@Override
public void fail(final Object msgId) {
if (msgId instanceof Long) {
final long id = (Long) msgId;
final MetaMessageWrapper wrapper = this.id2wrapperMap.remove(id);
if (wrapper == null) {
log.warn(String.format("don't know how to reject(%s: %s)", msgId.getClass().getName(), msgId));
return;
}
wrapper.success = false;
wrapper.latch.countDown();
} else {
log.warn(String.format("don't know how to reject(%s: %s)", msgId.getClass().getName(), msgId));
}
}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(this.scheme.getOutputFields());
}
public boolean isDistributed() {
return true;
}
}
StringScheme.java
/*
* (C) 2007-2012 Alibaba Group Holding Limited.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
* Authors:
* wuhua <[email protected]> , boyan <[email protected]>
*/
package cn.newbies.storm.spout;
import backtype.storm.spout.Scheme;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import java.io.UnsupportedEncodingException;
import java.util.List;
public class StringScheme implements Scheme {
private static final long serialVersionUID = -1641199638262927802L;
public List<Object> deserialize(byte[] bytes) {
try {
return new Values(new String(bytes, "UTF-8"));
} catch (UnsupportedEncodingException e) {
throw new RuntimeException(e);
}
}
public Fields getOutputFields() {
return new Fields("str");
}
}
WordReader.java
package cn.newbies.storm.spout;
import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.FileFilterUtils;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
public class WordReader extends BaseRichSpout {
private static final long serialVersionUID = 2197521792014017918L;
private String inputPath;
private SpoutOutputCollector collector;
@Override
@SuppressWarnings("rawtypes")
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
inputPath = (String) conf.get("INPUT_PATH");
}
@Override
public void nextTuple() {
Collection<File> files = FileUtils.listFiles(new File(inputPath), FileFilterUtils.notFileFilter(FileFilterUtils.suffixFileFilter(".bak")), null);
for (File f : files) {
try {
List<String> lines = FileUtils.readLines(f, "UTF-8");
for (String line : lines) {
collector.emit(new Values(line));
}
FileUtils.moveFile(f, new File(f.getPath() + System.currentTimeMillis() + ".bak"));
} catch (IOException e) {
e.printStackTrace();
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
}
4.bolt
FieldsGroupingBolt.java
package cn.newbies.storm.bolt;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import cn.itcast.storm.utils.PropertyUtil;
public class FieldsGroupingBolt extends BaseBasicBolt {
private static final long serialVersionUID = -3176498679732038154L;
private String separator = ",";
private List<String> fieldsList;
private String target;
@Override
@SuppressWarnings("rawtypes")
public void prepare(Map stormConf, TopologyContext context) {
String scheme = PropertyUtil.getProperty("scheme");
separator = PropertyUtil.getProperty("separator");
target = PropertyUtil.getProperty("target");
fieldsList = Arrays.asList(scheme.split(","));
}
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String data = input.getString(0);
if(data != null){
String[] lines = data.split(System.getProperty("line.separator"));
for(String line : lines){
String[] fields = line.split(separator);
int index = fieldsList.indexOf(target);
String fieldVal = fields[index];
collector.emit(new Values(fieldVal, line));
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("partition", "line"));
}
}
WordCounter.java
package cn.newbies.storm.bolt;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;
public class WordCounter extends BaseBasicBolt {
private static final long serialVersionUID = 5683648523524179434L;
private HashMap<String, Integer> counters = new HashMap<String, Integer>();
@Override
@SuppressWarnings("rawtypes")
public void prepare(Map stormConf, TopologyContext context) {
final long timeOffset = Long.parseLong(stormConf.get("TIME_OFFSET").toString());
new Thread(new Runnable() {
@Override
public void run() {
while (true) {
for (Entry<String, Integer> entry : counters.entrySet()) {
System.out.println(entry.getKey() + " : " + entry.getValue());
}
System.out.println("---------------------------------------");
try {
Thread.sleep(timeOffset * 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}).start();
}
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String str = input.getString(0);
if (!counters.containsKey(str)) {
counters.put(str, 1);
} else {
Integer c = counters.get(str) + 1;
counters.put(str, c);
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
WordSpliter.java
package cn.newbies.storm.bolt;
import org.apache.commons.lang.StringUtils;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class WordSpliter extends BaseBasicBolt {
private static final long serialVersionUID = -5653803832498574866L;
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String line = input.getString(0);
String[] words = line.split(" ");
for (String word : words) {
word = word.trim();
if (StringUtils.isNotBlank(word)) {
word = word.toLowerCase();
collector.emit(new Values(word));
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
WriterBolt.java
package cn.newbies.storm.bolt;
import java.io.FileWriter;
import java.io.IOException;
import java.util.HashMap;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;
public class WriterBolt extends BaseBasicBolt {
private static final Log log = LogFactory.getLog(WriterBolt.class);
private HashMap<String, FileWriter> writerMap = new HashMap<String, FileWriter>();
private ReadWriteLock lock = new ReentrantReadWriteLock();
private String filePath = "/home/cloud/logs/";
private String lineSeparator = System.getProperty("line.separator");
private static final long serialVersionUID = -8235524993337289148L;
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String partition = input.getStringByField("partition");
String line = input.getStringByField("line");
lock.readLock().lock();
FileWriter fileWriter = writerMap.get(partition);
if (fileWriter == null) {
lock.readLock().unlock();
lock.writeLock().lock();
try {
if (writerMap.get(partition) == null) {
fileWriter = new FileWriter(filePath + partition, true);
writerMap.put(partition, fileWriter);
}
} catch (IOException e) {
log.error(e);
} finally {
lock.writeLock().unlock();
}
lock.readLock().lock();
}
try {
fileWriter.write(line);
fileWriter.write(lineSeparator);
fileWriter.flush();
} catch (IOException e) {
log.error(e);
} finally{
lock.readLock().unlock();
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
常用的类
BaseRichSpout (消息生产者)
BaseBasicBolt (消息处理者)
TopologyBuilder (拓扑的构建器)
Values (将数据存放到values ,发送到下个组件)
Tuple(发送的数据被封装到Tuple,可以通tuple接收上个组件发送的消息)
Config (配置)
StormSubmitter / LocalCluster (拓扑提交器)
开发第一个Storm程序
Storm的WorodCount例子