Kafka 与 Storm 整合

Kafka 与 Storm 整合

原创编写: 王宇 
2016-10-27


 


整合Storm的概念

  • 整合场景

    A spout is a source of streams. For example, a spout may read tuples off a Kafka Topic and emit them as a stream. A bolt consumes input streams, process and possibly emits new streams. Bolts can do anything from running functions, filtering tuples, do streaming aggregations, streaming joins, talk to databases, and more. Each node in a Storm topology executes in parallel. A topology runs indefinitely until you terminate it. Storm will automatically reassign any failed tasks. Additionally, Storm guarantees that there will be no data loss, even if the machines go down and messages are dropped.

  • BrokerHosts - ZkHosts & StaticHosts 
    BrokerHosts is an interface and ZkHosts and StaticHosts are its two main implementations. ZkHosts is used to track the Kafka brokers dynamically by maintaining the details in ZooKeeper, while StaticHosts is used to manually / statically set the Kafka brokers and its details. ZkHosts is the simple and fast way to access the Kafka broker

  • KafkaConfig API 
    This API is used to define configuration settings for the Kafka cluster. The signature of Kafka Con-fig is defined as follows 
    public KafkaConfig(BrokerHosts hosts, string topic)

    Hosts − The BrokerHosts can be ZkHosts / StaticHosts. 
    Topic − topic name

  • SpoutConfig API 
    Spoutconfig is an extension of KafkaConfig that supports additional ZooKeeper information. 
    public SpoutConfig(BrokerHosts hosts, string topic, string zkRoot, string id)

    • Hosts − The BrokerHosts can be any implementation of BrokerHosts interface
    • Topic − topic name.
    • zkRoot − ZooKeeper root path.
    • id − The spout stores the state of the offsets its consumed in Zookeeper. The id should uniquely identify your spout.
  • SchemeAsMultiScheme (调度)????

  • KafkaSpout API 
    KafkaSpout is our spout implementation, which will integrate with Storm. It fetches the messages from kafka topic and emits it into Storm ecosystem as tuples. KafkaSpout get its config-uration details from SpoutConfig.
  1. // ZooKeeper connection string
  2. BrokerHosts hosts =newZkHosts(zkConnString);
  3. //Creating SpoutConfig Object
  4. SpoutConfig spoutConfig =newSpoutConfig(hosts,
  5. topicName,"/"+ topicName UUID.randomUUID().toString());
  6. //convert the ByteBuffer to String.
  7. spoutConfig.scheme =newSchemeAsMultiScheme(newStringScheme());
  8. //Assign SpoutConfig to KafkaSpout.
  9. KafkaSpout kafkaSpout =newKafkaSpout(spoutConfig);

创建 Bolt

  • Bolt 接口定义参考《Storm学习笔记》
  • 分割统计单词个数的Bolt例子
    • SplitBolt.java
  1. import java.util.Map;
  2. import org.apache.storm.tuple.Tuple;
  3. import org.apache.storm.tuple.Fields;
  4. import org.apache.storm.tuple.Values;
  5. import org.apache.storm.task.OutputCollector;
  6. import org.apache.storm.topology.OutputFieldsDeclarer;
  7. import org.apache.storm.topology.IRichBolt;
  8. import org.apache.storm.task.TopologyContext;
  9. publicclassSplitBoltimplementsIRichBolt{
  10. privateOutputCollector collector;
  11. @Override
  12. publicvoid prepare(Map stormConf,TopologyContext context,
  13. OutputCollector collector){
  14. this.collector = collector;
  15. }
  16. @Override
  17. publicvoid execute(Tuple input){
  18. String sentence = input.getString(0);
  19. String[] words = sentence.split(" ");
  20. for(String word: words){
  21. word = word.trim();
  22. if(!word.isEmpty()){
  23. word = word.toLowerCase();
  24. collector.emit(newValues(word));
  25. }
  26. }
  27. collector.ack(input);
  28. }
  29. @Override
  30. publicvoid declareOutputFields(OutputFieldsDeclarer declarer){
  31. declarer.declare(newFields("word"));
  32. }
  33. @Override
  34. publicvoid cleanup(){}
  35. @Override
  36. publicMap<String,Object> getComponentConfiguration(){
  37. returnnull;
  38. }
  39. }
  • CountBolt.java
  1. import java.util.Map;
  2. import java.util.HashMap;
  3. import org.apache.storm.tuple.Tuple;
  4. import org.apache.storm.task.OutputCollector;
  5. import org.apache.storm.topology.OutputFieldsDeclarer;
  6. import org.apache.storm.topology.IRichBolt;
  7. import org.apache.storm.task.TopologyContext;
  8. publicclassCountBoltimplementsIRichBolt{
  9. Map<String,Integer> counters;
  10. privateOutputCollector collector;
  11. @Override
  12. publicvoid prepare(Map stormConf,TopologyContext context,
  13. OutputCollector collector){
  14. this.counters =newHashMap<String,Integer>();
  15. this.collector = collector;
  16. }
  17. @Override
  18. publicvoid execute(Tuple input){
  19. String str = input.getString(0);
  20. if(!counters.containsKey(str)){
  21. counters.put(str,1);
  22. }else{
  23. Integer c = counters.get(str)+1;
  24. counters.put(str, c);
  25. }
  26. collector.ack(input);
  27. }
  28. @Override
  29. publicvoid cleanup(){
  30. for(Map.Entry<String,Integer> entry:counters.entrySet()){
  31. System.out.println(entry.getKey()+" : "+ entry.getValue());
  32. }
  33. }
  34. @Override
  35. publicvoid declareOutputFields(OutputFieldsDeclarer declarer){
  36. }
  37. @Override
  38. publicMap<String,Object> getComponentConfiguration(){
  39. returnnull;
  40. }
  41. }

提交到 Topology

  • Topology概念参考《Storm学习笔记》
  • KafkaStormSample.java
  1. import org.apache.storm.Config;
  2. import org.apache.storm.LocalCluster;
  3. import org.apache.storm.topology.TopologyBuilder;
  4. import java.util.ArrayList;
  5. import java.util.List;
  6. import java.util.UUID;
  7. import org.apache.storm.spout.SchemeAsMultiScheme;
  8. import org.apache.storm.kafka.trident.GlobalPartitionInformation;
  9. import org.apache.storm.kafka.ZkHosts;
  10. import org.apache.storm.kafka.Broker;
  11. import org.apache.storm.kafka.StaticHosts;
  12. import org.apache.storm.kafka.BrokerHosts;
  13. import org.apache.storm.kafka.SpoutConfig;
  14. import org.apache.storm.kafka.KafkaConfig;
  15. import org.apache.storm.kafka.KafkaSpout;
  16. import org.apache.storm.kafka.StringScheme;
  17. publicclassKafkaStormSample{
  18. publicstaticvoid main(String[] args)throwsException{
  19. Config config =newConfig();
  20. config.setDebug(true);
  21. config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,1);
  22. String zkConnString ="localhost:2181";
  23. String topic ="my-first-topic";
  24. BrokerHosts hosts =newZkHosts(zkConnString);
  25. SpoutConfig kafkaSpoutConfig =newSpoutConfig(hosts, topic,"/"+ topic, UUID.randomUUID().toString());
  26. kafkaSpoutConfig.bufferSizeBytes =1024*1024*4;
  27. kafkaSpoutConfig.fetchSizeBytes =1024*1024*4;
  28. // kafkaSpoutConfig.forceFromStart = true;
  29. kafkaSpoutConfig.scheme =newSchemeAsMultiScheme(newStringScheme());
  30. TopologyBuilder builder =newTopologyBuilder();
  31. builder.setSpout("kafka-spout",newKafkaSpout(kafkaSpoutConfig));
  32. builder.setBolt("word-spitter",newSplitBolt()).shuffleGrouping("kafka-spout");
  33. builder.setBolt("word-counter",newCountBolt()).shuffleGrouping("word-spitter");
  34. LocalCluster cluster =newLocalCluster();
  35. cluster.submitTopology("KafkaStormSample", config, builder.createTopology());
  36. Thread.sleep(10000);
  37. cluster.shutdown();
  38. }
  39. }

版本

  • Zookeeper: zookeeper-3.5.2-alpha.tar.gz
  • Curator: 2.9.1
  • SLF4j: slf4j-1.7.21.tar.gz
  • Kafka: kafka_2.11-0.10.1.0
  • Storm : apache-storm-1.0.2.tar.gz
  • JSON: json-simple-1.1.1.jar
  • JDK: 1.8.0

编译

  • 依赖包

    • Curator 
      Before moving compilation, Kakfa-Storm integration needs curator ZooKeeper client java library. 这个包的路径,要加到CLASSPATH中 

      curator-client-2.9.1.jar 
      curator-framework-2.9.1.jar 
    • JSON 
      json-simple-1.1.1.jar
    • Storm-Kafka 
      json-simple-1.1.1.jar
    • Kafka Lib
    • Storm Lib
    • SLF4J 
      当CLASSPATH中,包含了Kafka 和Storm 的lib 后, SLF4j会产生冲突,将Kafka中的SLF4j 移除CLASSPATH即可
  • 编译命令

    1. $ javac *.java

执行

  • 开启服务: Zookeeper Kafka Storm
  1. $cd /opt/zookeeper
  2. $.bin/zkServer.sh start
  3. $cd /opt/kafka
  4. $./bin/kafaka-server-start.sh config/server.properties
  5. $ cd /opt/storm
  6. $./bin/storm nimbus
  7. $./bin/storm supervisor
  • 创建Topic: “my-first-topic”
  1. $ ./bin/kafktopics.sh --create --zookeeper localhost:2181--replication-factor 1--partitions 1--topic my-first-topic
  2. $ ./bin/kafktopics.sh --list --zookeeper localhost:2181
  • 在Kafka Producer CLI 输入message
  1. $ ./bin/kafka-console-producer.sh --broker-list localhost:9092--topic my-first-topic
  2. hello
  3. kafka
  4. storm
  5. spark
  6. test message
  7. anther test message
  • 执行例子
  1. $ java KafkaStormSample
  • 输出结果
  1. storm :1
  2. test :2
  3. spark :1
  4. another :1
  5. kafka :1
  6. hello :1
  7. message :2

猜你喜欢

转载自wangyuxxx.iteye.com/blog/2342481