反压机制
Storm的反压机制不成熟直接带来的后果是洪峰流量或者流量预估不准确导致任务的worker OOM,频繁漂移。Storm1.0版本已经使用新的反压机制,社区解决方案:https://issues.apache.org/jira/browse/STORM-886
https://github.com/apache/storm/pull/700
反压过程
- worker executor的接收队列大于高水位,通知反压线程
- worker反压线程通知zookeeper,executor繁忙事件
- 所有worker监听zookeeper executor繁忙的事件
- worker spouts降低发送tuple速度
storm 1.0以前的反压
Spout tuples 不使用message id, TOPOLOGY_MAX_SPOUT_PENDING是不生效的。
public static final String TOPOLOGY_MAX_SPOUT_PENDING="topology.max.spout.pending"; public static final Object TOPOLOGY_MAX_SPOUT_PENDING_SCHEMA = ConfigValidation.IntegerValidator; The maximum number of tuples that can be pending on a spout task at any given time. This config applies to individual tasks, not to spouts or topologies as a whole. A pending tuple is one that has been emitted from a spout but has not been acked or failed yet. Note that this config parameter has no effect for unreliable spouts that don't tag their tuples with a message id.
spout执行nextTupe逻辑
(fn [] ;; This design requires that spouts be non-blocking (disruptor/consume-batch receive-queue event-handler) ;;从recieve-queue取出batch tuples, 并使用tuple-action-fn处理 ;; try to clear the overflow-buffer, 将overflow-buffer里面的数据放到发送的缓存queue里面 (try-cause (while (not (.isEmpty overflow-buffer)) (let [[out-task out-tuple] (.peek overflow-buffer)] (transfer-fn out-task out-tuple false nil) (.removeFirst overflow-buffer))) (catch InsufficientCapacityException e )) (let [active? @(:storm-active-atom executor-data) curr-count (.get emitted-count)] (if (and (.isEmpty overflow-buffer) ;;只有当overflow-buffer为空, 并且pending没有达到上限的时候, spout可以继续emit tuple (or (not max-spout-pending) (< (.size pending) max-spout-pending))) (if active? ;;storm集群是否active (do ;;storm active (when-not @last-active ;;如果当前spout出于unactive状态 (reset! last-active true) (log-message "Activating spout " component-id ":" (keys task-datas)) (fast-list-iter [^ISpout spout spouts] (.activate spout))) ;;先active spout (fast-list-iter [^ISpout spout spouts] (.nextTuple spout))) ;;调用nextTuple,产生新的tuple (do ;;storm unactive (when @last-active ;;如果spout出于active状态 (reset! last-active false) (log-message "Deactivating spout " component-id ":" (keys task-datas)) (fast-list-iter [^ISpout spout spouts] (.deactivate spout))) ;;deactive spout并休眠 ;; TODO: log that it's getting throttled (Time/sleep 100)))) (if (and (= curr-count (.get emitted-count)) active?) ;;没有能够emit新的tuple(前后emitted-count没有变化) (do (.increment empty-emit-streak) (.emptyEmit spout-wait-strategy (.get empty-emit-streak))) ;;调用spout-wait-strategy进行sleep (.set empty-emit-streak 0) )) 0)) ;;返回0, 表示async-loop的sleep时间为0 :kill-fn (:report-error-and-die executor-data) :factory? true :thread-name component-id)]))tuple pending的个数是有限制
p*num-tasks p是TOPOLOGY-MAX-SPOUT-PENDING, num-tasks是spout的task数 max-spout-pending (executor-max-spout-pending storm-conf (count task-datas)) (defn executor-max-spout-pending [storm-conf num-tasks] (let [p (storm-conf TOPOLOGY-MAX-SPOUT-PENDING)] (if p (* p num-tasks))))
反压不成熟带来的问题
fieldsGrouping不合理或者洪峰流量,bolt接收队列暴涨导致OOM,完善反压后可以解决这个问题。
扩展阅读 http://www.cnblogs.com/fxjwind/p/3238648.html