Storm worker进程发生死锁

worker进程发生死锁[0.9.5]

在0.9.5版本的storm发现netty通信过程中出现死锁,只发现过一次,发生频率较低。

死锁栈信息

Found one Java-level deadlock:
=============================
"Thread-12-disruptor-worker-transfer-queue":
  waiting to lock monitor 0x00007f85e000aee8 (object 0x00000007b4ffc8e8, a java.lang.Object),
  which is held by "client-worker-3"
"client-worker-3":
  waiting to lock monitor 0x00007f85dc021ef8 (object 0x000000079d717418, a backtype.storm.messaging.netty.Client),
  which is held by "Thread-12-disruptor-worker-transfer-queue"
 
  
"Thread-12-disruptor-worker-transfer-queue" prio=10 tid=0x00007f8750cb9000 nid=0x8a3 waiting for monitor entry [0x00007f86627e6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:398)
        - waiting to lock <0x00000007b4ffc8e8> (a java.lang.Object)
        at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
        at org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
        at org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
        at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
        at org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
        at org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
        at org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
        at org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
        at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
        at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
        at org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
        at backtype.storm.messaging.netty.Client.flushMessages(Client.java:480)
        - locked <0x000000079d717418> (a backtype.storm.messaging.netty.Client)
        at backtype.storm.messaging.netty.Client.send(Client.java:400)
        - locked <0x000000079d717418> (a backtype.storm.messaging.netty.Client)
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__6940$fn__6941.invoke(worker.clj:336)
        at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__6940.invoke(worker.clj:334)
        at backtype.storm.disruptor$clojure_handler$reify__1605.onEvent(disruptor.clj:58)
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
        at backtype.storm.disruptor$consume_loop_STAR_$fn__1618.invoke(disruptor.clj:94)
        at backtype.storm.util$async_loop$fn__459.invoke(util.clj:463)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Thread.java:745)
 
"client-worker-3" prio=10 tid=0x00007f8750d36800 nid=0x813 waiting for monitor entry [0x00007f86d0d65000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at backtype.storm.messaging.netty.Client.closeChannelAndReconnect(Client.java:501)
        - waiting to lock <0x000000079d717418> (a backtype.storm.messaging.netty.Client)
        at backtype.storm.messaging.netty.Client.access$1400(Client.java:78)
        at backtype.storm.messaging.netty.Client$3.operationComplete(Client.java:492)
        at org.apache.storm.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
        at org.apache.storm.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413)
        at org.apache.storm.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
        at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:417)
        - locked <0x00000007b4ffc8e8> (a java.lang.Object)
        at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373)
        at org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
        at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
        at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

解决办法

升级storm版本至0.9.6或者更高版本。

https://issues.apache.org/jira/browse/STORM-839

猜你喜欢

转载自woodding2008.iteye.com/blog/2261352