worker进程发生死锁[0.9.5]
在0.9.5版本的storm发现netty通信过程中出现死锁,只发现过一次,发生频率较低。
死锁栈信息
Found one Java-level deadlock: ============================= "Thread-12-disruptor-worker-transfer-queue": waiting to lock monitor 0x00007f85e000aee8 (object 0x00000007b4ffc8e8, a java.lang.Object), which is held by "client-worker-3" "client-worker-3": waiting to lock monitor 0x00007f85dc021ef8 (object 0x000000079d717418, a backtype.storm.messaging.netty.Client), which is held by "Thread-12-disruptor-worker-transfer-queue" "Thread-12-disruptor-worker-transfer-queue" prio=10 tid=0x00007f8750cb9000 nid=0x8a3 waiting for monitor entry [0x00007f86627e6000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:398) - waiting to lock <0x00000007b4ffc8e8> (a java.lang.Object) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128) at org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84) at org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779) at org.apache.storm.netty.channel.Channels.write(Channels.java:725) at org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) at org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) at org.apache.storm.netty.channel.Channels.write(Channels.java:704) at org.apache.storm.netty.channel.Channels.write(Channels.java:671) at org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248) at backtype.storm.messaging.netty.Client.flushMessages(Client.java:480) - locked <0x000000079d717418> (a backtype.storm.messaging.netty.Client) at backtype.storm.messaging.netty.Client.send(Client.java:400) - locked <0x000000079d717418> (a backtype.storm.messaging.netty.Client) at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__6940$fn__6941.invoke(worker.clj:336) at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__6940.invoke(worker.clj:334) at backtype.storm.disruptor$clojure_handler$reify__1605.onEvent(disruptor.clj:58) at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) at backtype.storm.disruptor$consume_loop_STAR_$fn__1618.invoke(disruptor.clj:94) at backtype.storm.util$async_loop$fn__459.invoke(util.clj:463) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:745) "client-worker-3" prio=10 tid=0x00007f8750d36800 nid=0x813 waiting for monitor entry [0x00007f86d0d65000] java.lang.Thread.State: BLOCKED (on object monitor) at backtype.storm.messaging.netty.Client.closeChannelAndReconnect(Client.java:501) - waiting to lock <0x000000079d717418> (a backtype.storm.messaging.netty.Client) at backtype.storm.messaging.netty.Client.access$1400(Client.java:78) at backtype.storm.messaging.netty.Client$3.operationComplete(Client.java:492) at org.apache.storm.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427) at org.apache.storm.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413) at org.apache.storm.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:417) - locked <0x00000007b4ffc8e8> (a java.lang.Object) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373) at org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
解决办法
升级storm版本至0.9.6或者更高版本。
https://issues.apache.org/jira/browse/STORM-839