HMaster节点无故挂掉

版权声明:如若转载,请联系作者。 https://blog.csdn.net/liu16659/article/details/82430396

HMaster节点无故挂掉

1.报错信息:

2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: []
2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase master:60000-0x400000393770000 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
    at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

2.详细日志

2018-09-05 18:40:58,468 INFO  [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Opening socket connection to server server6/192.168.211.6:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:40:58,468 INFO  [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Socket connection established to server6/192.168.211.6:2181, initiating session
2018-09-05 18:40:58,473 WARN  [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770000 has expired
2018-09-05 18:40:58,473 INFO  [main-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770000 has expired, closing socket connection
2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: []
2018-09-05 18:40:58,483 FATAL [main-EventThread] master.HMaster: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase master:60000-0x400000393770000 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
    at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2018-09-05 18:40:58,520 INFO  [main-EventThread] regionserver.HRegionServer: STOPPED: Stopped by main-EventThread
2018-09-05 18:40:58,528 INFO  [master/server4/192.168.211.4:60000] regionserver.HRegionServer: Stopping infoServer
2018-09-05 18:40:58,849 INFO  [server4:60000.activeMasterManager-SendThread(server4:2181)] zookeeper.ClientCnxn: Opening socket connection to server server4/192.168.211.4:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:40:58,849 INFO  [server4:60000.activeMasterManager-SendThread(server4:2181)] zookeeper.ClientCnxn: Socket connection established to server4/192.168.211.4:2181, initiating session
2018-09-05 18:40:58,850 INFO  [server4:60000.activeMasterManager-SendThread(server4:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x400000393770003, likely server has closed socket, closing socket connection and attempting reconnect
2018-09-05 18:40:58,851 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x400000393770000
2018-09-05 18:41:03,261 INFO  [server4,60000,1535980776013_splitLogManager__ChoreService_1] hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor missed its start time
2018-09-05 18:41:03,262 INFO  [server4,60000,1535980776013_splitLogManager__ChoreService_1] hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor was stopped
2018-09-05 18:41:03,273 INFO  [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Opening socket connection to server server6/192.168.211.6:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:41:03,689 INFO  [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Opening socket connection to server server6/192.168.211.6:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:41:04,329 INFO  [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Socket connection established to server6/192.168.211.6:2181, initiating session
2018-09-05 18:41:04,329 INFO  [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Socket connection established to server6/192.168.211.6:2181, initiating session
2018-09-05 18:41:04,335 WARN  [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10000 has expired
2018-09-05 18:41:04,335 INFO  [master/server4/192.168.211.4:60000-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10000 has expired, closing socket connection
2018-09-05 18:41:04,335 WARN  [master/server4/192.168.211.4:60000-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
    at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2018-09-05 18:41:04,335 INFO  [master/server4/192.168.211.4:60000-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x5000003d3f10000
2018-09-05 18:41:04,335 INFO  [master/server4/192.168.211.4:60000-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x5000003d3f10000
2018-09-05 18:41:04,336 WARN  [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10001 has expired
2018-09-05 18:41:04,336 INFO  [server4:60000.activeMasterManager-SendThread(server6:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x5000003d3f10001 has expired, closing socket connection
2018-09-05 18:41:04,336 WARN  [server4:60000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:692)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:624)
    at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2018-09-05 18:41:05,015 INFO  [server4:60000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x5000003d3f10001
2018-09-05 18:41:05,015 INFO  [server4:60000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x5000003d3f10001
2018-09-05 18:41:05,706 INFO  [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Opening socket connection to server server5/192.168.211.5:2181. Will not attempt to authenticate using SASL (unknown error)
2018-09-05 18:41:05,706 INFO  [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Socket connection established to server5/192.168.211.5:2181, initiating session
2018-09-05 18:41:09,357 WARN  [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770003 has expired
2018-09-05 18:41:09,357 INFO  [server4:60000.activeMasterManager-SendThread(server5:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x400000393770003 has expired, closing socket connection
2018-09-05 18:41:09,357 INFO  [server4:60000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x400000393770003
2018-09-05 18:41:15,059 WARN  [server4,60000,1535980776013_ChoreService_2] zookeeper.ZKUtil: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Unable to list children of znode /hbase/replication/peers 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/peers
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:319)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:462)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:490)
    at org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.getAllPeerIds(ReplicationPeersZKImpl.java:392)
    at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner.getUnDeletedQueues(ReplicationZKNodeCleaner.java:80)
    at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleanerChore.chore(ReplicationZKNodeCleanerChore.java:48)
    at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:16,601 ERROR [server4,60000,1535980776013_ChoreService_2] zookeeper.ZooKeeperWatcher: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/peers
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:319)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:462)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:490)
    at org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.getAllPeerIds(ReplicationPeersZKImpl.java:392)
    at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner.getUnDeletedQueues(ReplicationZKNodeCleaner.java:80)
    at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleanerChore.chore(ReplicationZKNodeCleanerChore.java:48)
    at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:16,601 ERROR [server4,60000,1535980776013_ChoreService_2] hbase.ScheduledChore: Caught error
java.lang.NullPointerException
    at java.util.HashSet.<init>(HashSet.java:119)
    at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleaner.getUnDeletedQueues(ReplicationZKNodeCleaner.java:80)
    at org.apache.hadoop.hbase.master.cleaner.ReplicationZKNodeCleanerChore.chore(ReplicationZKNodeCleanerChore.java:48)
    at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:16,601 INFO  [server4,60000,1535980776013_ChoreService_2] hbase.ScheduledChore: Chore: server4,60000,1535980776013-DoMetricsChore missed its start time
2018-09-05 18:41:16,601 INFO  [server4,60000,1535980776013_ChoreService_2] hbase.ScheduledChore: Chore: server4,60000,1535980776013-DoMetricsChore was stopped
2018-09-05 18:41:15,577 WARN  [server4,60000,1535980776013_ChoreService_1] master.HMaster: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Unable to list backup servers
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/backup-masters
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:319)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:519)
    at org.apache.hadoop.hbase.master.HMaster.getClusterStatusWithoutCoprocessor(HMaster.java:2422)
    at org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:49)
    at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:189)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:18,871 INFO  [master/server4/192.168.211.4:60000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2018-09-05 18:41:21,347 INFO  [master/server4/192.168.211.4:60000] procedure2.ProcedureExecutor: Stopping the procedure executor
2018-09-05 18:41:22,576 INFO  [master/server4/192.168.211.4:60000] wal.WALProcedureStore: Stopping the WAL Procedure Store
2018-09-05 18:41:23,348 INFO  [master/server4/192.168.211.4:60000] regionserver.HRegionServer: stopping server server4,60000,1535980776013
2018-09-05 18:41:23,349 INFO  [master/server4/192.168.211.4:60000] regionserver.HRegionServer: stopping server server4,60000,1535980776013; all regions closed.
2018-09-05 18:41:23,349 INFO  [master/server4/192.168.211.4:60000] hbase.ChoreService: Chore service for: server4,60000,1535980776013 had [[ScheduledChore: Name: server4,60000,1535980776013-ClusterStatusChore Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: server4,60000,1535980776013-BalancerChore Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: server4,60000,1535980776013-RegionNormalizerChore Period: 1800000 Unit: MILLISECONDS], [ScheduledChore: Name: CatalogJanitor-server4:60000 Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: HFileCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: CompactedHFilesCleaner Period: 120000 Unit: MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit: MILLISECONDS]] on shutdown
2018-09-05 18:41:23,409 WARN  [master/server4/192.168.211.4:60000] zookeeper.ZKUtil: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:397)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
    at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:269)
    at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1286)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1146)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:23,410 ERROR [master/server4/192.168.211.4:60000] zookeeper.ZooKeeperWatcher: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:397)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
    at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:269)
    at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1286)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1146)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:23,410 ERROR [master/server4/192.168.211.4:60000] master.ActiveMasterManager: master:60000-0x400000393770000, quorum=server4:2181,server5:2181,server6:2181, baseZNode=/hbase Error deleting our own master address node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:397)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
    at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:269)
    at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1286)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1146)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:23,924 INFO  [master/server4/192.168.211.4:60000] hbase.ChoreService: Chore service for: server4,60000,1535980776013_splitLogManager_ had [] on shutdown
2018-09-05 18:41:24,133 INFO  [master/server4/192.168.211.4:60000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2018-09-05 18:41:24,416 INFO  [master/server4/192.168.211.4:60000] ipc.RpcServer: Stopping server on 60000
2018-09-05 18:41:24,481 INFO  [RpcServer.listener,port=60000] ipc.RpcServer: RpcServer.listener,port=60000: stopping
2018-09-05 18:41:24,639 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2018-09-05 18:41:24,639 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2018-09-05 18:41:27,497 WARN  [master/server4/192.168.211.4:60000] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/server4,60000,1535980776013
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:182)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1250)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1239)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1504)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1154)
    at java.lang.Thread.run(Thread.java:748)
2018-09-05 18:41:27,694 INFO  [master/server4/192.168.211.4:60000] regionserver.HRegionServer: stopping server server4,60000,1535980776013; zookeeper connection closed.
2018-09-05 18:41:27,694 INFO  [master/server4/192.168.211.4:60000] regionserver.HRegionServer: master/server4/192.168.211.4:60000 exiting

3.问题相关点

  • 长时间GC停顿导致zk会话超时 => 看gc日志,看一下出问题时间点之前有没有这种gc停顿

4.解决方案

5.参考文章

6.总结

  • zk这个gc问题,也是它分布式锁最大的缺陷

猜你喜欢

转载自blog.csdn.net/liu16659/article/details/82430396
今日推荐