Continue to create, accelerate growth! This is the 5th day of my participation in the "Nuggets Daily New Plan · October Update Challenge", click to view the details of the event
1. Concept introduction
1. Storm is a real-time stream computing with large amounts of data.
2. Features
- Support various real-time scenarios: process messages and update databases in real time; query or calculate real-time data streams, and push the latest results to the client for display; parallelize time-consuming queries based on distributed RPC calls .
- High scalability: easy to expand, add machines, and adjust the parallelism
- Guarantee of no data loss
- Super robustness
- Ease of use: the core semantics are very simple
3. Operation process
4. Introduction to nouns
- Parallelism: it is the task, each copy of the spout/bolt code will run in a task
- Flow grouping: the relationship of data flow between tasks and tasks
- Stream grouping strategy: Shuffle Grouping: Random emission
- Fields Grouping: emit based on one or more fields
5. Getting started example
package com.mmc.storm;
import lombok.extern.slf4j.Slf4j;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
/**
* @description: 单词统计
* @author: mmc
* @create: 2019-10-21 22:47
**/
@Slf4j
public class WordCountTopology {
/**
* 负责从数据源获取数据
*/
public static class RandomSentenceSpout extends BaseRichSpout{
private SpoutOutputCollector collector;
private Random random;
@Override
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
this.collector=spoutOutputCollector;
this.random=new Random();
}
/**
* 它会运行在task中,也就是说task会不断的循环调用它,就可以不断的发射新的数据,形成一个数据流
*/
@Override
public void nextTuple() {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
String[] sentences = new String[]{"the cow jumped over the moon", "an apple a day keeps the doctor away",
"four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature"};
String sentence=sentences[random.nextInt(sentences.length)];
log.info("发送一段句子:"+sentence);
//这个Values,你可以理解为是构建一个Tuple,tuple是最小的数据单位
collector.emit(new Values(sentence));
}
/**
* 定义发送出去的tuple的字段的名称
* @param outputFieldsDeclarer
*/
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("sentence"));
}
}
/**
* 每一个Bolt代码也是发送到task里面去运行
*/
public static class SplientSentence extends BaseRichBolt{
private OutputCollector collector;
@Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.collector=outputCollector;
}
/**
* 每接受到一条数据都会交给execute方法去处理
* @param tuple
*/
@Override
public void execute(Tuple tuple) {
String sentence=tuple.getStringByField("sentence");
String[] words=sentence.split(" ");
for (String word:words){
collector.emit(new Values(word));
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("word"));
}
}
public static class WordCount extends BaseRichBolt{
private OutputCollector collector;
private Map<String,Long> wordCountMap=new HashMap<>();
@Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.collector=outputCollector;
}
@Override
public void execute(Tuple tuple) {
String word=tuple.getStringByField("word");
Long count=wordCountMap.get(word);
if(count==null){
count=1L;
}else {
count++;
}
wordCountMap.put(word,count);
log.info("【单词计数:】{}出现的次数是{}",word,count);
collector.emit(new Values(word,count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("word","count"));
}
}
public static void main(String[] args) throws InterruptedException {
//将Spolt和Bolts组合起来,形成一个拓扑
TopologyBuilder builder=new TopologyBuilder();
builder.setSpout("RandomSentence",new RandomSentenceSpout(),2);
builder.setBolt("SplitSentence",new SplientSentence(),5).setNumTasks(10).shuffleGrouping("RandomSentence");
builder.setBolt("WordCount",new WordCount(),10).setNumTasks(20).
fieldsGrouping("SplitSentence",new Fields("word"));
Config config=new Config();
//命令行执行
if(args!=null&&args.length>0){
config.setNumWorkers(3);
try {
StormSubmitter.submitTopology(args[0],config,builder.createTopology());
} catch (AlreadyAliveException e) {
e.printStackTrace();
} catch (InvalidTopologyException e) {
e.printStackTrace();
} catch (AuthorizationException e) {
e.printStackTrace();
}
}else {
config.setMaxTaskParallelism(20);
LocalCluster cluster=new LocalCluster();
cluster.submitTopology("WordCountTopology",config,builder.createTopology());
Thread.sleep(60000);
cluster.shutdown();
}
}
}
复制代码
2. Cluster deployment
- Download storm
download address: www.apache.org/dyn/closer.… - Configure environment variables
vi ~/.bashrc
export STORM_HOME=/usr/local/storm
export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$STORM_HOME/bin
复制代码
source ~/.bashrc 3. Modify configuration
Open storm/conf/storm.yaml and add the following configuration
storm.zookeeper.servers:
- "192.168.1.12"
- "192.168.1.13"
- "192.168.1.14"
nimbus.seeds: ["192.168.1.12"]
storm.local.dir: "/var/storm"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
复制代码
4. Create a folder
mkdir /var/storm
复制代码
5. Start
- Start zookeeper first
- A node starts storm nimbus >/dev/null 2>&1 &
- All three nodes execute storm supervisor >/dev/null 2>&1 &
- a node storm ui>/dev/null 2>&1 &
- Two supervisor nodes storm logviewer >/dev/null 2>&1 &
6. Close storm kill wordCountTopology