Storm notes finishing (5): reliability analysis, timed tasks and detailed explanation of Storm UI parameters

[TOC]


Special note: In the previous four Storm notes, the spout in the example of calculating the sum uses the logic of an infinite loop. In fact, this is not correct. The reason is very simple. In the API provided by Storm, the nextTuple method That is, the loop is executed, which is equivalent to doing a double-layer loop. Because in the analysis of the reliability acker case later, it was found that after adding the infinite loop logic, the task to which the nextTuple belongs has no way to jump out of the nextTuple method, and there is no way to execute the following ack or fail methods. This is especially true requires attention.

Storm reliability analysis

Fundamental

  • worker process dies

    If the worker process hangs, the Storm cluster will restart a worker process.

  • supervisor process dies

    If the supervisor process hangs, it will not affect the previously submitted topology, but it cannot assign tasks to this node later, because this node is no longer a member of the cluster.

  • The nimbus process dies (has HA issues) fails fast

    If the nimbus process hangs, it will not affect the previously submitted topology, but it will not be possible to submit a new topology to the cluster later. The version below 1.0 has the problem of HA. After 1.0, this problem has been fixed, and there can be multiple alternative nimbus.

  • Node down

  • ack/fail message acknowledgment mechanism (to ensure a tuple is fully processed)

    • When transmitting a tuple in the spout, the messageid needs to be sent at the same time, which is equivalent to opening the message confirmation mechanism
    • If there are more tuples in your topology, then set the number of ackers to be more, and the efficiency will be higher.
    • Set the number of ackers in a topology through config.setNumAcckers(num). The default value is 1.
    • Note: acker uses a special algorithm, so that the amount of memory required to track the state of each spout tuple is constant (20 bytes) (you can learn about its algorithm, and I will not do an in-depth understanding of this algorithm for the time being. Baidu storm acker You can find relevant analysis articles)
    • Note: If a tuple is not successfully processed within the specified timeout (Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS defaults to 30 seconds), then the tuple will be considered to have failed.
  • Fully handles tuples

    A tuple is fully processed in Storm means: This tuple and all tuples derived from this tuple were successfully processed.

Reliability acker case

As mentioned earlier, if you want to use the qck/failconfirmation mechanism, you need to do the following:

1.在我们的spout中重写ack和fail方法
2.spout发送tuple时需要携带messageId
3.bolt成功或失败处理后要主动进行回调

According to the above description, the program code is as follows, pay attention to these points:

package cn.xpleaf.bigdata.storm.acker;

import cn.xpleaf.bigdata.storm.utils.StormUtil;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

import java.util.Date;
import java.util.Map;
import java.util.UUID;

/**
 * 1°、实现数字累加求和的案例:数据源不断产生递增数字,对产生的数字累加求和。
 * <p>
 * Storm组件:Spout、Bolt、数据是Tuple,使用main中的Topology将spout和bolt进行关联
 * MapReduce的组件:Mapper和Reducer、数据是Writable,通过一个main中的job将二者关联
 * <p>
 * 适配器模式(Adapter):BaseRichSpout,其对继承接口中一些没必要的方法进行了重写,但其重写的代码没有实现任何功能。
 * 我们称这为适配器模式
 * <p>
 * storm消息确认机制---可靠性分析
 * acker
 * fail
 */
public class AckerSumTopology {

    /**
     * 数据源
     */
    static class OrderSpout extends BaseRichSpout {

        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private SpoutOutputCollector collector; // 发送tuple的组件

        @Override
        public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

         private long num = 0;
        /**
         * 接收数据的核心方法
         */
        @Override
        public void nextTuple() {
            String messageId = UUID.randomUUID().toString().replaceAll("-", "").toLowerCase();
//            while (true) {
                num++;
                StormUtil.sleep(1000);
                System.out.println("当前时间" + StormUtil.df_yyyyMMddHHmmss.format(new Date()) + "产生的订单金额:" + num);
                this.collector.emit(new Values(num), messageId);
//            }
        }

        /**
         * 是对发送出去的数据的描述schema
         */
        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("order_cost"));
        }

        @Override
        public void ack(Object msgId) {
            System.out.println(msgId + "对应的消息被处理成功了");
        }

        @Override
        public void fail(Object msgId) {
            System.out.println(msgId + "---->对应的消息被处理失败了");
        }
    }

    /**
     * 计算和的Bolt节点
     */
    static class SumBolt extends BaseRichBolt {

        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private OutputCollector collector; // 发送tuple的组件

        @Override
        public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

        private Long sumOrderCost = 0L;

        /**
         * 处理数据的核心方法
         */
        @Override
        public void execute(Tuple input) {
            Long orderCost = input.getLongByField("order_cost");
            sumOrderCost += orderCost;
            if (orderCost % 10 == 1) {   // 每10次模拟消息失败一次
                collector.fail(input);
            } else {
                System.out.println("线程ID:" + Thread.currentThread().getId() + " ,商城网站到目前" + StormUtil.df_yyyyMMddHHmmss.format(new Date()) + "的商品总交易额" + sumOrderCost);
                collector.ack(input);
            }
            StormUtil.sleep(1000);
        }

        /**
         * 如果当前bolt为最后一个处理单元,该方法可以不用管
         */
        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {

        }
    }

    /**
     * 构建拓扑,相当于在MapReduce中构建Job
     */
    public static void main(String[] args) throws Exception {
        TopologyBuilder builder = new TopologyBuilder();
        /**
         * 设置spout和bolt的dag(有向无环图)
         */
        builder.setSpout("id_order_spout", new OrderSpout());
        builder.setBolt("id_sum_bolt", new SumBolt(), 1)
                .shuffleGrouping("id_order_spout"); // 通过不同的数据流转方式,来指定数据的上游组件
        // 使用builder构建topology
        StormTopology topology = builder.createTopology();
        String topologyName = AckerSumTopology.class.getSimpleName();  // 拓扑的名称
        Config config = new Config();   // Config()对象继承自HashMap,但本身封装了一些基本的配置
        // 启动topology,本地启动使用LocalCluster,集群启动使用StormSubmitter
        if (args == null || args.length < 1) {  // 没有参数时使用本地模式,有参数时使用集群模式
            LocalCluster localCluster = new LocalCluster(); // 本地开发模式,创建的对象为LocalCluster
            localCluster.submitTopology(topologyName, config, topology);
        } else {
            StormSubmitter.submitTopology(topologyName, config, topology);
        }
    }
}

After running (run locally or upload to the cluster to submit the job), the output is as follows:

当前时间20180413215706产生的订单金额:1
当前时间20180413215707产生的订单金额:2
7a4ce596fd3a40659f2d7f80a7738f55---->对应的消息被处理失败了
线程ID:133 ,商城网站到目前20180413215707的商品总交易额3
当前时间20180413215708产生的订单金额:3
0555a933a49f413e94480be201a55615对应的消息被处理成功了
线程ID:133 ,商城网站到目前20180413215708的商品总交易额6
当前时间20180413215709产生的订单金额:4
4b923132e4034e939c875aca368a8897对应的消息被处理成功了
线程ID:133 ,商城网站到目前20180413215709的商品总交易额10
当前时间20180413215710产生的订单金额:5
51f159472e854ba282ab84a2218459b8对应的消息被处理成功了
线程ID:133 ,商城网站到目前20180413215710的商品总交易额15
......

Storm scheduled tasks

General business data storage will eventually be implemented and stored in RDBMS, but RDBMS cannot achieve high access volume, and its capability cannot achieve real-time processing, or the processing capability is limited, which will cause problems such as connection interruption. A roundabout way can be adopted. For example, it can be cached to a high-speed in-memory database (such as redis), and then the data in the in-memory database can be synchronized to rdbms regularly, and it can be done regularly.

  • The data can be integrated and stored in the database every specified time.
  • Or execute some every specified time

You can use timed tasks in Storm to implement the function of these timed data landing, but you need to understand Storm timed tasks first.

Global scheduled tasks

set in main

conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60);  // 设置多久发送一个系统的tuple定时发射数据

However, we generally set timing tasks for specific bolts, and there is no need to send system tuples to each global bolt, which is very resource-intensive, so there are local timing tasks, which are also commonly used by us.

Note: Storm will send system-level tuples to all bolts in the topology at user-set intervals. Set the timer in the main function, and Storm will periodically send system-level tuple to all bolts in the topology. If you only need to set the timing function for a certain bolt, you only need to override the getComponentConfiguration method in this bolt, and set the timing interval inside. That's it.

The test code is as follows:

package cn.xpleaf.bigdata.storm.quartz;

import org.apache.storm.Config;
import org.apache.storm.Constants;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.shade.org.apache.commons.io.FileUtils;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

import java.io.File;
import java.io.IOException;
import java.util.*;

/**
 * 2°、单词计数:监控一个目录下的文件,当发现有新文件的时候,
        把文件读取过来,解析文件中的内容,统计单词出现的总次数
        E:\data\storm

        研究storm的定时任务
        有两种方式:
            1.main中设置,全局有效
            2.在特定bolt中设置,bolt中有效
 */
public class QuartzWordCountTopology {

    /**
     * Spout,获取数据源,这里是持续读取某一目录下的文件,并将每一行输出到下一个Bolt中
     */
    static class FileSpout extends BaseRichSpout {
        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private SpoutOutputCollector collector; // 发送tuple的组件

        @Override
        public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

        @Override
        public void nextTuple() {
            File directory = new File("D:/data/storm");
            // 第二个参数extensions的意思就是,只采集某些后缀名的文件
            Collection<File> files = FileUtils.listFiles(directory, new String[]{"txt"}, true);
            for (File file : files) {
                try {
                    List<String> lines = FileUtils.readLines(file, "utf-8");
                    for(String line : lines) {
                        this.collector.emit(new Values(line));
                    }
                    // 当前文件被消费之后,需要重命名,同时为了防止相同文件的加入,重命名后的文件加了一个随机的UUID,或者加入时间戳也可以的
                    File destFile = new File(file.getAbsolutePath() + "_" + UUID.randomUUID().toString() + ".completed");
                    FileUtils.moveFile(file, destFile);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("line"));
        }
    }

    /**
     * Bolt节点,将接收到的每一行数据切割为一个个单词并发送到下一个节点
     */
    static class SplitBolt extends BaseRichBolt {

        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private OutputCollector collector; // 发送tuple的组件

        @Override
        public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

        @Override
        public void execute(Tuple input) {
            if (!input.getSourceComponent().equalsIgnoreCase(Constants.SYSTEM_COMPONENT_ID) ) { // 确保不是系统发送的tuple,才使用我们的业务逻辑
                String line = input.getStringByField("line");
                String[] words = line.split(" ");
                for (String word : words) {
                    this.collector.emit(new Values(word, 1));
                }
            } else {
                System.out.println("splitBolt: " + input.getSourceComponent().toString());
            }
        }

        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("word", "count"));
        }
    }

    /**
     * Bolt节点,执行单词统计计算
     */
    static class WCBolt extends BaseRichBolt {

        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private OutputCollector collector; // 发送tuple的组件

        @Override
        public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

        private Map<String, Integer> map = new HashMap<>();

        @Override
        public void execute(Tuple input) {
            if (!input.getSourceComponent().equalsIgnoreCase(Constants.SYSTEM_COMPONENT_ID) ) { // 确保不是系统发送的tuple,才使用我们的业务逻辑
                String word = input.getStringByField("word");
                Integer count = input.getIntegerByField("count");
            /*if (map.containsKey(word)) {
                map.put(word, map.get(word) + 1);
            } else {
                map.put(word, 1);
            }*/
                map.put(word, map.getOrDefault(word, 0) + 1);

                System.out.println("====================================");
                map.forEach((k, v) -> {
                    System.out.println(k + ":::" + v);
                });
            } else {
                System.out.println("sumBolt: " + input.getSourceComponent().toString());
            }
        }

        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {

        }
    }

    /**
     * 构建拓扑,组装Spout和Bolt节点,相当于在MapReduce中构建Job
     */
    public static void main(String[] args) {
        TopologyBuilder builder = new TopologyBuilder();
        // dag
        builder.setSpout("id_file_spout", new FileSpout());
        builder.setBolt("id_split_bolt", new SplitBolt()).shuffleGrouping("id_file_spout");
        builder.setBolt("id_wc_bolt", new WCBolt()).shuffleGrouping("id_split_bolt");

        StormTopology stormTopology = builder.createTopology();
        LocalCluster cluster = new LocalCluster();
        String topologyName = QuartzWordCountTopology.class.getSimpleName();
        Config config = new Config();
        config.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10);
        cluster.submitTopology(topologyName, config, stormTopology);
    }
}

output:

splitBolt: __system
sumBolt: __system
splitBolt: __system
sumBolt: __system
......

Partially scheduled tasks

Use the following code in the bolt to determine whether it is a trigger bolt

tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)

If it is true, execute the code that needs to be executed by the timed task, and finally return, if it is false, execute the business logic of normal tuple processing.

That is, for the bolt that needs data landing, we can only set a timing task for the bolt, so that the system will regularly send the system-level tuple to the bolt, and judge in our bolt's code, if the received system-level tuple Bolt, then perform data landing operations, such as writing data to the database or other operations, or execute our business code according to normal logic.

This method is often used in work.

The test procedure is as follows:

package cn.xpleaf.bigdata.storm.quartz;

import clojure.lang.Obj;
import org.apache.storm.Config;
import org.apache.storm.Constants;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.shade.org.apache.commons.io.FileUtils;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

import java.io.File;
import java.io.IOException;
import java.util.*;

/**
 * 2°、单词计数:监控一个目录下的文件,当发现有新文件的时候,
        把文件读取过来,解析文件中的内容,统计单词出现的总次数
        E:\data\storm

        研究storm的定时任务
        有两种方式:
            1.main中设置,全局有效
            2.在特定bolt中设置,bolt中有效
 */
public class QuartzPartWCTopology {

    /**
     * Spout,获取数据源,这里是持续读取某一目录下的文件,并将每一行输出到下一个Bolt中
     */
    static class FileSpout extends BaseRichSpout {
        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private SpoutOutputCollector collector; // 发送tuple的组件

        @Override
        public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

        @Override
        public void nextTuple() {
            File directory = new File("D:/data/storm");
            // 第二个参数extensions的意思就是,只采集某些后缀名的文件
            Collection<File> files = FileUtils.listFiles(directory, new String[]{"txt"}, true);
            for (File file : files) {
                try {
                    List<String> lines = FileUtils.readLines(file, "utf-8");
                    for(String line : lines) {
                        this.collector.emit(new Values(line));
                    }
                    // 当前文件被消费之后,需要重命名,同时为了防止相同文件的加入,重命名后的文件加了一个随机的UUID,或者加入时间戳也可以的
                    File destFile = new File(file.getAbsolutePath() + "_" + UUID.randomUUID().toString() + ".completed");
                    FileUtils.moveFile(file, destFile);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("line"));
        }
    }

    /**
     * Bolt节点,将接收到的每一行数据切割为一个个单词并发送到下一个节点
     */
    static class SplitBolt extends BaseRichBolt {

        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private OutputCollector collector; // 发送tuple的组件

        @Override
        public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

        @Override
        public void execute(Tuple input) {
            String line = input.getStringByField("line");
            String[] words = line.split(" ");
            for (String word : words) {
                this.collector.emit(new Values(word, 1));
            }
        }

        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("word", "count"));
        }
    }

    /**
     * Bolt节点,执行单词统计计算
     */
    static class WCBolt extends BaseRichBolt {

        private Map conf;   // 当前组件配置信息
        private TopologyContext context;    // 当前组件上下文对象
        private OutputCollector collector; // 发送tuple的组件

        @Override
        public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
            this.conf = conf;
            this.context = context;
            this.collector = collector;
        }

        private Map<String, Integer> map = new HashMap<>();

        @Override
        public void execute(Tuple input) {
            if (!input.getSourceComponent().equalsIgnoreCase(Constants.SYSTEM_COMPONENT_ID) ) { // 确保不是系统发送的tuple,才使用我们的业务逻辑
                String word = input.getStringByField("word");
                Integer count = input.getIntegerByField("count");
            /*if (map.containsKey(word)) {
                map.put(word, map.get(word) + 1);
            } else {
                map.put(word, 1);
            }*/
                map.put(word, map.getOrDefault(word, 0) + 1);

                System.out.println("====================================");
                map.forEach((k, v) -> {
                    System.out.println(k + ":::" + v);
                });
            } else {
                System.out.println("sumBolt: " + input.getSourceComponent().toString() + "---" + System.currentTimeMillis());
            }
        }

        @Override
        public Map<String, Object> getComponentConfiguration() { // 修改局部bolt的配置信息
            Map<String, Object> config = new HashMap<>();
            config.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10);
            return config;
        }

        @Override
        public void declareOutputFields(OutputFieldsDeclarer declarer) {

        }
    }

    /**
     * 构建拓扑,组装Spout和Bolt节点,相当于在MapReduce中构建Job
     */
    public static void main(String[] args) {
        TopologyBuilder builder = new TopologyBuilder();
        // dag
        builder.setSpout("id_file_spout", new FileSpout());
        builder.setBolt("id_split_bolt", new SplitBolt()).shuffleGrouping("id_file_spout");
        builder.setBolt("id_wc_bolt", new WCBolt()).shuffleGrouping("id_split_bolt");

        StormTopology stormTopology = builder.createTopology();
        LocalCluster cluster = new LocalCluster();
        String topologyName = QuartzPartWCTopology.class.getSimpleName();
        Config config = new Config();
        cluster.submitTopology(topologyName, config, stormTopology);
    }
}

The output is as follows:

sumBolt: __system---1523631954330
sumBolt: __system---1523631964330
sumBolt: __system---1523631974329
sumBolt: __system---1523631984329
sumBolt: __system---1523631994330
sumBolt: __system---1523632004330
sumBolt: __system---1523632014329
sumBolt: __system---1523632024330
......

Introduction to Storm UI parameters

Storm notes finishing (5): reliability analysis, timed tasks and detailed explanation of Storm UI parameters

  • deactive: inactive (paused)

  • emitted: emitted tuple数

  • transferred: transferred tuple数

    The difference between emitted: if a task emits a tuple to 2 tasks, the number of transferred tuples is twice the number of emitted tuples

  • complete latency: the average time from spout emitting a tuple to spout ack this tuple (can be considered as the tuple and the entire processing time of the tuple tree)

  • process latency: the average time from the bolt receiving a tuple to the bolt ack tuple, if the acker mechanism is not started, the value is 0

  • execute latency: the average time for bolt to process a tuple, excluding acker operations, in milliseconds (that is, the average time for bolt to execute the execute method)

  • capacity: The closer this value is to 1, it means that the bolt or spout has basically been calling the execute method, which means that the parallelism is not enough, and the number of executors of this component needs to be expanded.

Summary: execute latency and procedure latnecy are the timeliness of processing messages, and capacity indicates whether the processing capacity is saturated. From these three parameters, you can know where the bottleneck of the topology is.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324518067&siteId=291194637