初识Storm(1)

个性化视频推荐系统

3. 推荐系统简介 :后台系统每天做一次数据的全量处理,统计分析用户以往的观看记录,将统计后的结果作为推荐的依据,然后将视频个性化的推荐给用户,提高用户观看视频的可能性。
 
存在的问题:
1.  单机部署
2.  处理数据量有限,不能扩展
3.  无法做到实时性
 

Storm是什么?

1. Storm是Twitter开源的一个分布式的实时计算系统
2. 使用场景:数据的实时分析,持续计算,分布式RPC等等。
 
Storm优点:
1.  分布式
2.  可扩展
3.  高可靠性
4.  编程模型简单
5.  高效实时

下载Storm安装包

1. 官网地址:http://storm-project.net/
4. 安装包结构介绍
 

 Storm安装

环境:centos6.4
软件:
jzmq-master-----java与c++通讯的桥梁,有了它,就可以使用zeromp了
storm-0.8.2
zeromq-2.1.7-----号称史上最牛逼的消息队列(用c++写的)
zookeeper-3.4.5


1.编译安装ZMQ:
tar -xzf zeromq-2.1.7.tar.gz
cd zeromq-2.1.7
./configure
#编译可能会出错:configure: error: Unable to find a working C++ compiler
#安装一下依赖的rpm包:libstdc++-devel gcc-c++ 
可以上网的情况下: 
yum install gcc-c++
虚拟机不能上网情况:首先到http://mirrors.163.com/centos/6.4/os/x86_64/Packages/ 下载rpm
rpm -i libstdc++-devel-4.4.7-3.el6.x86_64.rpm
rpm -i gcc-c++-4.4.7-3.el6.x86_64.rpm
rpm -i libuuid-devel-2.17.2-12.9.el6.x86_64.rpm

./configure
make
make install



2.编译安装JZMQ:
cd jzmq
./autogen.sh
#报错:autogen.sh: error: could not find libtool. libtool is required to run autogen.sh. 缺少libtool
yum install libtool
或者手动安装
rpm -i autoconf-2.63-5.1.el6.noarch.rpm 
rpm -i automake-1.11.1-4.el6.noarch.rpm 
rpm -i libtool-2.2.6-15.5.el6.x86_64.rpm

./configure
make
make install




3.编译安装Python(storm的启动配置文件是用python写的)
tar –zxvf Python-2.6.6.tgz
cd Python-2.6.6
./configure
make
make install



下载并解压Storm发布版本
下一步,需要在Nimbus和Supervisor机器上安装Storm发行版本。
1. 下载Storm发行版本
wget https://dl.dropbox.com/u/133901206/storm-0.8.2.zip
2. 解压到安装目录下:
unzip storm-0.8.1.zip
修改storm.yaml配置文件
Storm发行版本解压目录下有一个conf/storm.yaml文件,用于配置Storm。默认配置在这里可以查看。conf/storm.yaml中的配置选项将覆盖defaults.yaml中的默认配置。
以下配置选项是必须在conf/storm.yaml中进行配置的:
1) storm.zookeeper.servers: Storm集群使用的Zookeeper集群地址,其格式如下:

storm.zookeeper.servers:
  - "111.222.333.444"
  - "555.666.777.888"
如果Zookeeper集群使用的不是默认端口,那么还需要storm.zookeeper.port选项。

2) storm.local.dir: Nimbus和Supervisor进程用于存储少量状态,如jars、confs等的本地磁盘目录,需要提前创建该目录并给以足够的访问权限。
然后在storm.yaml中配置该目录,如:
storm.local.dir: "/usr/storm/workdir"
3) java.library.path: Storm使用的本地库(ZMQ和JZMQ)加载路径,默认为"/usr/local/lib:/opt/local/lib:/usr/lib",
一般来说ZMQ和JZMQ默认安装在/usr/local/lib 下,因此不需要配置即可。
4) nimbus.host: Storm集群Nimbus机器地址,各个Supervisor工作节点需要知道哪个机器是Nimbus,以便下载Topologies的jars、confs等文件,如:
nimbus.host: "111.222.333.444"
5) supervisor.slots.ports: 对于每个Supervisor工作节点,需要配置该工作节点可以运行的worker数量。每个worker占用一个单独的端口用于接收消息,
该配置选项即用于定义哪些端口是可被worker使用的。默认情况下,每个节点上可运行4个workers,分别在6700、6701、6702和6703端口,如:
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703
2.5 启动Storm各个后台进程
最后一步,启动Storm的所有后台进程。和Zookeeper一样,Storm也是快速失败(fail-fast)的系统,这样Storm才能在任意时刻被停止,并且当进程重启后被正确地恢复执行。这也是为什么Storm不在进程内保存状态的原因,即使Nimbus或Supervisors被重启,运行中的Topologies不会受到影响。

以下是启动Storm各个后台进程的方式:

Nimbus: 在Storm主控节点上运行"bin/storm nimbus >/dev/null 2>&1 &"启动Nimbus后台程序,并放到后台执行;
Supervisor: 在Storm各个工作节点上运行"bin/storm supervisor >/dev/null 2>&1 &"启动Supervisor后台程序,并放到后台执行;
UI: 在Storm主控节点上运行"bin/storm ui >/dev/null 2>&1 &"启动UI后台程序,并放到后台执行,启动后可以通过http://{nimbus host}:8080观察集群的worker资源使用情况、Topologies的运行状态等信息。
注意事项:

Storm后台进程被启动后,将在Storm安装部署目录下的logs/子目录下生成各个进程的日志文件。
经测试,Storm UI必须和Storm Nimbus部署在同一台机器上,否则UI无法正常工作,因为UI进程会检查本机是否存在Nimbus链接。
为了方便使用,可以将bin/storm加入到系统环境变量中。
至此,Storm集群已经部署、配置完毕,可以向集群提交拓扑运行了。

3. 向集群提交任务
1)启动Storm Topology:

storm jar allmycode.jar org.me.MyTopology arg1 arg2 arg3
其中,allmycode.jar是包含Topology实现代码的jar包,org.me.MyTopology的main方法是Topology的入口,arg1、arg2和arg3为org.me.MyTopology执行时需要传入的参数。

2)停止Storm Topology:

storm kill {toponame}
其中,{toponame}为Topology提交到Storm集群时指定的Topology任务名称。

 

storm的实例 

Storm的WordCount例子
作用:统计文本中每个单词出现的次数
例如a.txt中的内容是:
storm hive hadoop storm
hadoop storm

统计结果:
hadoop :2
hive :1
storm :3

数据过滤例子

1. Topology
WordCountTopo.java
package cn.newbies.storm.topology;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;
import cn.newbies.storm.bolt.WordCounter;
import cn.newbies.storm.bolt.WordSpliter;
import cn.newbies.storm.spout.WordReader;

public class WordCountTopo {

	/**
	 * Storm word count demo
	 * 
	 * @param args
	 */
	public static void main(String[] args) {
		if (args.length != 2) {
			System.err.println("Usage: inputPaht timeOffset");
			System.err.println("such as : java -jar  WordCount.jar D://input/ 2");
			System.exit(2);
		}
		TopologyBuilder builder = new TopologyBuilder();
		builder.setSpout("word-reader", new WordReader());
		builder.setBolt("word-spilter", new WordSpliter()).shuffleGrouping("word-reader");
		builder.setBolt("word-counter", new WordCounter()).shuffleGrouping("word-spilter");
		String inputPaht = args[0];
		String timeOffset = args[1];
		Config conf = new Config();
		conf.put("INPUT_PATH", inputPaht);
		conf.put("TIME_OFFSET", timeOffset);
		conf.setDebug(false);
		LocalCluster cluster = new LocalCluster();
		cluster.submitTopology("WordCount", conf, builder.createTopology());

	}

}
FileWriterTopo.java
package cn.newbies.storm.topology;

import backtype.storm.Config;
import backtype.storm.StormSubmitter;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import cn.newbies.storm.bolt.FieldsGroupingBolt;
import cn.newbies.storm.bolt.WordCounter;
import cn.newbies.storm.bolt.WriterBolt;
import cn.newbies.storm.spout.MetaSpout;
import cn.newbies.storm.spout.StringScheme;
import cn.newbies.storm.utils.PropertyUtil;

import com.taobao.metamorphosis.client.MetaClientConfig;
import com.taobao.metamorphosis.client.consumer.ConsumerConfig;
import com.taobao.metamorphosis.utils.ZkUtils.ZKConfig;

public class FileWriterTopo {

	/**
	 * 数据写入文本的Topo
	 * @param args
	 * @throws Exception 
	 */
	public static void main(String[] args) throws Exception {
		
		if (args.length != 2) {
			System.err.println("Usage: topic group");
			System.exit(2);
		}
		String topic = args[0];
		String group = args[1];
		
//		zkConnect=master:2181
//		zkSessionTimeoutMs=30000
//		zkConnectionTimeoutMs=30000
//		zkSyncTimeMs=5000
//
//		scheme=date,id,content
//		separator=,
//		target=date
		
		// 设置ZK配置信息
		final ZKConfig zkConfig = new ZKConfig();
		zkConfig.zkConnect = PropertyUtil.getProperty("zkConnect");
		zkConfig.zkSessionTimeoutMs = Integer.parseInt(PropertyUtil.getProperty("zkSessionTimeoutMs"));
		zkConfig.zkConnectionTimeoutMs = Integer.parseInt(PropertyUtil.getProperty("zkConnectionTimeoutMs"));
		zkConfig.zkSyncTimeMs = Integer.parseInt(PropertyUtil.getProperty("zkSyncTimeMs"));

		// 设置MetaQ
		final MetaClientConfig metaClientConfig = new MetaClientConfig();
		metaClientConfig.setZkConfig(zkConfig);
		ConsumerConfig consumerConfig = new ConsumerConfig(group);
		
		TopologyBuilder builder = new TopologyBuilder();
		Config conf = new Config();
		conf.setNumWorkers(2);
		builder.setSpout("meta-spout", new MetaSpout(metaClientConfig, topic, consumerConfig, new StringScheme()));
		builder.setBolt("field-grouping-bolt", new FieldsGroupingBolt(), 2).shuffleGrouping("meta-spout");
		builder.setBolt("file-writer-bolt", new WriterBolt(), 4).fieldsGrouping("field-grouping-bolt", new Fields("partition"));
		StormSubmitter.submitTopology("fileWriter", conf, builder.createTopology());
		

	}

}

2. utils

PropertyUtil.java
package cn.newbies.storm.utils;

import java.io.InputStream;
import java.util.Properties;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

/**
 * 属性配置读取工具
 */
public class PropertyUtil {

	private static final Log log = LogFactory.getLog(PropertyUtil.class);
	private static Properties pros = new Properties();

	// 加载属性文件
	static {
		try {
			InputStream in = PropertyUtil.class.getClassLoader().getResourceAsStream("config.properties");
			pros.load(in);
		} catch (Exception e) {
			log.error("load configuration error", e);
		}
	}

	/**
	 * 读取配置文中的属性值
	 * @param key
	 * @return
	 */
	public static String getProperty(String key) {
		return pros.getProperty(key);
	}

}

3. spout

MetaMessageWrapper.java
/*
 * (C) 2007-2012 Alibaba Group Holding Limited.
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 * Authors:
 *   wuhua <[email protected]> , boyan <[email protected]>
 */
package cn.newbies.storm.spout;

import java.util.concurrent.CountDownLatch;

import com.taobao.metamorphosis.Message;


/**
 * Meta消息的包装类,关联一个CountDownLatch
 * 
 * @author boyan([email protected])
 * @date 2011-11-8
 * 
 */
public final class MetaMessageWrapper {

    public final Message message;
    public final CountDownLatch latch;
    public volatile boolean success = false;


    public MetaMessageWrapper(final Message message) {
        super();
        this.message = message;
        this.latch = new CountDownLatch(1);
    }

}
MetaSpout.java
/*
 * (C) 2007-2012 Alibaba Group Holding Limited.
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 * Authors:
 *   wuhua <[email protected]> , boyan <[email protected]>
 */
package cn.newbies.storm.spout;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.Executor;
import java.util.concurrent.TimeUnit;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import backtype.storm.spout.Scheme;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;

import com.taobao.gecko.core.util.LinkedTransferQueue;
import com.taobao.metamorphosis.Message;
import com.taobao.metamorphosis.client.MessageSessionFactory;
import com.taobao.metamorphosis.client.MetaClientConfig;
import com.taobao.metamorphosis.client.MetaMessageSessionFactory;
import com.taobao.metamorphosis.client.consumer.ConsumerConfig;
import com.taobao.metamorphosis.client.consumer.MessageConsumer;
import com.taobao.metamorphosis.client.consumer.MessageListener;
import com.taobao.metamorphosis.exception.MetaClientException;

/**
 * 支持metamorphosis消息消费的storm spout
 * 
 * @author boyan([email protected])
 * @date 2011-11-8
 * 
 */
public class MetaSpout extends BaseRichSpout {
	private static final long serialVersionUID = 4382748324382L;
	public static final String FETCH_MAX_SIZE = "meta.fetch.max_size";

	public static final int DEFAULT_MAX_SIZE = 1024 * 1024;

	private String topic;
	
	private transient MessageConsumer messageConsumer;

	private transient MessageSessionFactory sessionFactory;

	private final MetaClientConfig metaClientConfig;

	private final ConsumerConfig consumerConfig;

	static final Log log = LogFactory.getLog(MetaSpout.class);

	private final Scheme scheme;

	/**
	 * Time in milliseconds to wait for a message from the queue if there is no
	 * message ready when the topology requests a tuple (via
	 * {@link #nextTuple()}).
	 */
	public static final long WAIT_FOR_NEXT_MESSAGE = 1L;

	private transient ConcurrentHashMap<Long, MetaMessageWrapper> id2wrapperMap;
	private transient SpoutOutputCollector collector;

	private transient LinkedTransferQueue<MetaMessageWrapper> messageQueue;

	public MetaSpout(final MetaClientConfig metaClientConfig, final String topic, final ConsumerConfig consumerConfig, final Scheme scheme) {
		super();
		this.metaClientConfig = metaClientConfig;
		this.consumerConfig = consumerConfig;
		this.topic = topic;
		this.scheme = scheme;
	}

	@Override
	@SuppressWarnings("rawtypes")
	public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
		Integer maxSize = (Integer) conf.get(FETCH_MAX_SIZE);
		if (maxSize == null) {
			log.warn("Using default FETCH_MAX_SIZE");
			maxSize = DEFAULT_MAX_SIZE;
		}
		this.id2wrapperMap = new ConcurrentHashMap<Long, MetaMessageWrapper>();
		this.messageQueue = new LinkedTransferQueue<MetaMessageWrapper>();
		try {
			this.collector = collector;
			this.setUpMeta(topic, maxSize);
		} catch (final MetaClientException e) {
			log.error("Setup meta consumer failed", e);
		}
	}

	private void setUpMeta(final String topic, final Integer maxSize) throws MetaClientException {
		this.sessionFactory = new MetaMessageSessionFactory(this.metaClientConfig);
		this.messageConsumer = this.sessionFactory.createConsumer(this.consumerConfig);
		this.messageConsumer.subscribe(topic, maxSize, new MessageListener() {

			@Override
			public void recieveMessages(final Message message) {
				final MetaMessageWrapper wrapper = new MetaMessageWrapper(message);
				MetaSpout.this.id2wrapperMap.put(message.getId(), wrapper);
				MetaSpout.this.messageQueue.offer(wrapper);
				try {
					wrapper.latch.await();
				} catch (final InterruptedException e) {
					Thread.currentThread().interrupt();
				}
				// 消费失败,抛出运行时异常
				if (!wrapper.success) {
					message.setRollbackOnly();
				}
			}

			@Override
			public Executor getExecutor() {
				return null;
			}
		}).completeSubscribe();
	}

	@Override
	public void close() {
		try {
			this.messageConsumer.shutdown();
		} catch (final MetaClientException e) {
			log.error("Shutdown consumer failed", e);
		}
		try {
			this.sessionFactory.shutdown();
		} catch (final MetaClientException e) {
			log.error("Shutdown session factory failed", e);
		}

	}

	@Override
	public void nextTuple() {
		if (this.messageConsumer != null) {
			try {

				final MetaMessageWrapper wrapper = this.messageQueue.poll(WAIT_FOR_NEXT_MESSAGE, TimeUnit.MILLISECONDS);
				if (wrapper == null) {
					return;
				}
				final Message message = wrapper.message;
				this.collector.emit(this.scheme.deserialize(message.getData()), message.getId());
			} catch (final InterruptedException e) {
				// interrupted while waiting for message, big deal
			}
		}
	}

	@Override
	public void ack(final Object msgId) {
		if (msgId instanceof Long) {
			final long id = (Long) msgId;
			final MetaMessageWrapper wrapper = this.id2wrapperMap.remove(id);
			if (wrapper == null) {
				log.warn(String.format("don't know how to ack(%s: %s)", msgId.getClass().getName(), msgId));
				return;
			}
			wrapper.success = true;
			wrapper.latch.countDown();
		} else {
			log.warn(String.format("don't know how to ack(%s: %s)", msgId.getClass().getName(), msgId));
		}

	}

	@Override
	public void fail(final Object msgId) {
		if (msgId instanceof Long) {
			final long id = (Long) msgId;
			final MetaMessageWrapper wrapper = this.id2wrapperMap.remove(id);
			if (wrapper == null) {
				log.warn(String.format("don't know how to reject(%s: %s)", msgId.getClass().getName(), msgId));
				return;
			}
			wrapper.success = false;
			wrapper.latch.countDown();
		} else {
			log.warn(String.format("don't know how to reject(%s: %s)", msgId.getClass().getName(), msgId));
		}

	}

	@Override
	public void declareOutputFields(final OutputFieldsDeclarer declarer) {
		declarer.declare(this.scheme.getOutputFields());
	}

	public boolean isDistributed() {
		return true;
	}

}
StringScheme.java
/*
 * (C) 2007-2012 Alibaba Group Holding Limited.
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 * Authors:
 *   wuhua <[email protected]> , boyan <[email protected]>
 */
package cn.newbies.storm.spout;

import backtype.storm.spout.Scheme;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import java.io.UnsupportedEncodingException;
import java.util.List;

public class StringScheme implements Scheme {

	private static final long serialVersionUID = -1641199638262927802L;

	public List<Object> deserialize(byte[] bytes) {
        try {
            return new Values(new String(bytes, "UTF-8"));
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException(e);
        }
    }

    public Fields getOutputFields() {
        return new Fields("str");
    }
}
WordReader.java
package cn.newbies.storm.spout;

import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.List;
import java.util.Map;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.FileFilterUtils;

import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;

public class WordReader extends BaseRichSpout {

	private static final long serialVersionUID = 2197521792014017918L;
	private String inputPath;
	
	private SpoutOutputCollector collector;

	@Override
	@SuppressWarnings("rawtypes")
	public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
		this.collector = collector;
		inputPath = (String) conf.get("INPUT_PATH");
	}

	@Override
	public void nextTuple() {
		Collection<File> files = FileUtils.listFiles(new File(inputPath), FileFilterUtils.notFileFilter(FileFilterUtils.suffixFileFilter(".bak")), null);
		for (File f : files) {
			try {
				List<String> lines = FileUtils.readLines(f, "UTF-8");
				for (String line : lines) {
					collector.emit(new Values(line));
				}
				FileUtils.moveFile(f, new File(f.getPath() + System.currentTimeMillis() + ".bak"));
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		declarer.declare(new Fields("line"));
	}

}

4.bolt

FieldsGroupingBolt.java
package cn.newbies.storm.bolt;

import java.util.Arrays;
import java.util.List;
import java.util.Map;

import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import cn.itcast.storm.utils.PropertyUtil;

public class FieldsGroupingBolt extends BaseBasicBolt {

	private static final long serialVersionUID = -3176498679732038154L;
	
	private String separator = ",";
	private List<String> fieldsList;
	private String target;
	
	@Override
	@SuppressWarnings("rawtypes")
	public void prepare(Map stormConf, TopologyContext context) {
		String scheme = PropertyUtil.getProperty("scheme");
		separator = PropertyUtil.getProperty("separator");
		target = PropertyUtil.getProperty("target");
		fieldsList = Arrays.asList(scheme.split(","));
		
	}

	@Override
	public void execute(Tuple input, BasicOutputCollector collector) {
		String data = input.getString(0);
		if(data != null){
			String[] lines = data.split(System.getProperty("line.separator"));
			for(String line : lines){
				String[] fields = line.split(separator);
				int index = fieldsList.indexOf(target);
				String fieldVal = fields[index];
				collector.emit(new Values(fieldVal, line));
			}
		}

	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		declarer.declare(new Fields("partition", "line"));
	}

}
WordCounter.java
package cn.newbies.storm.bolt;

import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;

import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;

public class WordCounter extends BaseBasicBolt {

	private static final long serialVersionUID = 5683648523524179434L;

	private HashMap<String, Integer> counters = new HashMap<String, Integer>();

	@Override
	@SuppressWarnings("rawtypes")
	public void prepare(Map stormConf, TopologyContext context) {
		final long timeOffset = Long.parseLong(stormConf.get("TIME_OFFSET").toString());
		new Thread(new Runnable() {
			@Override
			public void run() {
				while (true) {
					for (Entry<String, Integer> entry : counters.entrySet()) {
						System.out.println(entry.getKey() + " : " + entry.getValue());
					}
					System.out.println("---------------------------------------");
					try {
						Thread.sleep(timeOffset * 1000);
					} catch (InterruptedException e) {
						e.printStackTrace();
					}
				}
			}
		}).start();

	}

	@Override
	public void execute(Tuple input, BasicOutputCollector collector) {
		String str = input.getString(0);
		if (!counters.containsKey(str)) {
			counters.put(str, 1);
		} else {
			Integer c = counters.get(str) + 1;
			counters.put(str, c);
		}

	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {

	}

}
WordSpliter.java
package cn.newbies.storm.bolt;

import org.apache.commons.lang.StringUtils;

import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;

public class WordSpliter extends BaseBasicBolt {

	private static final long serialVersionUID = -5653803832498574866L;

	@Override
	public void execute(Tuple input, BasicOutputCollector collector) {
		String line = input.getString(0);
		String[] words = line.split(" ");
		for (String word : words) {
			word = word.trim();
			if (StringUtils.isNotBlank(word)) {
				word = word.toLowerCase();
				collector.emit(new Values(word));
			}
		}
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		declarer.declare(new Fields("word"));

	}

}
WriterBolt.java
package cn.newbies.storm.bolt;

import java.io.FileWriter;
import java.io.IOException;
import java.util.HashMap;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;

public class WriterBolt extends BaseBasicBolt {

	private static final Log log = LogFactory.getLog(WriterBolt.class);
	private HashMap<String, FileWriter> writerMap = new HashMap<String, FileWriter>();
	private ReadWriteLock lock = new ReentrantReadWriteLock();
	private String filePath = "/home/cloud/logs/";
	private String lineSeparator = System.getProperty("line.separator");
	private static final long serialVersionUID = -8235524993337289148L;

	@Override
	public void execute(Tuple input, BasicOutputCollector collector) {
		String partition = input.getStringByField("partition");
		String line = input.getStringByField("line");
		lock.readLock().lock();
		FileWriter fileWriter = writerMap.get(partition);
		if (fileWriter == null) {
			lock.readLock().unlock();
			lock.writeLock().lock();
			try {
				if (writerMap.get(partition) == null) {
					fileWriter = new FileWriter(filePath + partition, true);
					writerMap.put(partition, fileWriter);
				}
			} catch (IOException e) {
				log.error(e);
			} finally {
				lock.writeLock().unlock();
			}
			lock.readLock().lock();
		}
		try {
			fileWriter.write(line);
			fileWriter.write(lineSeparator);
			fileWriter.flush();
		} catch (IOException e) {
			log.error(e);
		} finally{
			lock.readLock().unlock();
		}
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {

	}

}

常用的类

BaseRichSpout  (消息生产者)
BaseBasicBolt  (消息处理者)
TopologyBuilder  (拓扑的构建器)
Values (将数据存放到values ,发送到下个组件)
Tuple(发送的数据被封装到Tuple,可以通tuple接收上个组件发送的消息)
Config  (配置)
StormSubmitter / LocalCluster  (拓扑提交器)

开发第一个Storm程序

Storm的WorodCount例子

发布了284 篇原创文章 · 获赞 45 · 访问量 10万+

猜你喜欢

转载自blog.csdn.net/qq_31784189/article/details/103551619