Storm TickTuple stops unexpectedly

Storm's sliding window TickTuple is usually used to control bolt customization to perform storage operations. During use, the TickTuple "unexpectedly stopped" was encountered.

 

scene description

The Jiaodian task uses a total of 12 workers, and the tick tuple interval is 5 minutes.

The number of executors for WebPvLogSpout & WebPvLogBolt is 12.

WebPvLogSpout consumes kafka topic, log_product_ypvlog has a total of 10 partitions

It started on the afternoon of June 14th, and after 1:35 am the next day, if there are 2 bolts, the tick tuple will not be received.

The thread that produces the tickTuple message [ user-timer ] has been in a suspended state due to the disruptor .

"user-timer" daemon prio=10 tid=0x00007f8ea8ac7000 nid=0x353c runnable [0x00007f8e29662000]
   java.lang.Thread.State: TIMED_WAITING (parking) [is in a suspended state, waiting for the signal to activate itself, normally it should be in a sleeping state]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:349)
at com.lmax.disruptor.AbstractMultithreadedClaimStrategy.waitForFreeSlotAt(AbstractMultithreadedClaimStrategy.java:99)
at com.lmax.disruptor.AbstractMultithreadedClaimStrategy.incrementAndGet(AbstractMultithreadedClaimStrategy.java:49)
at com.lmax.disruptor.Sequencer.next(Sequencer.java:127)
at backtype.storm.utils.DisruptorQueue.publishDirect(DisruptorQueue.java:174)
at backtype.storm.utils.DisruptorQueue.publish(DisruptorQueue.java:167)
at backtype.storm.disruptor$publish.invoke(disruptor.clj:66)
at backtype.storm.disruptor$publish.invoke(disruptor.clj:68)
at backtype.storm.daemon.executor$setup_ticks_BANG_$fn__6510.invoke(executor.clj:315)
at backtype.storm.timer$schedule_recurring$this__1807.invoke(timer.clj:99)
at backtype.storm.timer$mk_timer$fn__1790$fn__1791.invoke(timer.clj:50)
at backtype.storm.timer$mk_timer$fn__1790.invoke(timer.clj:42)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:745)

 

problem analysis

The packaged kafka client will block the spout that cannot be allocated to the partition [ArrayBlockingQueue.take()]

The jstack information of the user-timer for multiple hours is all TIMED_WAITING (parking), and the status of other workers is sleeping.

The spout has been blocked for several hours. Since the spout receiving queue will be filled with metrics and system stream, and it will not be processed, the TickTuple message will not be put into the queue, which will cause the user-timer thread to hang all the time, waiting to be processed. wake.

 

solution

Setting the number of spouts equal to the number of partitions of the kafka topic actually prevents the spout from being blocked for a long time.

 

 

Official website issues

https://issues.apache.org/jira/browse/STORM-299

 

 

 

 

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327012417&siteId=291194637