hbase错误总结

前段时间hbase偶尔出现一俩台机器宕机:       

晚上9点45:45.455    WARN    org.apache.hadoop.hbase.util.Sleeper    
We slept 44340ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
晚上9点45:45.455    WARN    org.apache.hadoop.hbase.util.Sleeper    
We slept 44338ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
晚上9点45:45.455    INFO    org.apache.zookeeper.ClientCnxn    
Client session timed out, have not heard from server in 54340ms for sessionid 0x34859dbfa060904, closing socket connection and attempting reconnect
晚上9点45:45.454    WARN    org.apache.hadoop.ipc.RpcServer    
RpcServer.handler=18,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
晚上9点45:59.606    INFO    org.apache.zookeeper.ClientCnxn    
Client session timed out, have not heard from server in 54339ms for sessionid 0x1485cee1d5a03ec, closing socket connection and attempting reconnect
晚上9点46:14.061    WARN    org.apache.hadoop.ipc.RpcServer    
RpcServer.respondercallId: 18 service: ClientService methodName: Get size: 89 connection: 10.0.2.182:50259: output error
晚上9点46:14.061    WARN    org.apache.hadoop.hbase.util.JvmPauseMonitor    
Detected pause in JVM or host machine (eg GC): pause of approximately 13957ms
No GCs detected
晚上9点46:14.061    WARN    org.apache.hadoop.hdfs.DFSClient    
DFSOutputStream ResponseProcessor exception  for block BP-1813023907-10.0.2.161-1384842743529:blk_1112934385_1099559139217
java.io.EOFException: Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796)
晚上9点46:14.063    WARN    org.apache.hadoop.ipc.RpcServer    
RpcServer.handler=16,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
晚上9点46:14.062    WARN    org.apache.hadoop.ipc.RpcServer    
RpcServer.respondercallId: 18 service: ClientService methodName: Get size: 89 connection: 10.0.2.182:50298: output error
晚上9点46:14.065    WARN    org.apache.hadoop.hdfs.DFSClient    
Error Recovery for block BP-1813023907-10.0.2.161-1384842743529:blk_1112934385_1099559139217 in pipeline 10.0.2.182:50010, 10.0.2.172:50010: bad datanode 10.0.2.182:50010
晚上9点46:14.065    WARN    org.apache.hadoop.ipc.RpcServer    
RpcServer.handler=5,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
晚上9点46:14.062    FATAL    org.apache.hadoop.hbase.regionserver.HRegionServer    
ABORTING region server stat182,60020,1410352355225: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing stat182,60020,1410352355225 as dead server
    at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
背景介绍:

     16台regionserver,5台zookeeper,master有热备。
      实时处理程序,实时的向hbase表写数据,hbase集群与hdfs、hive、spark公用,集群上每天会跑12个小时左右的分析任务。
    原因分析:compaction、split过于频繁:

               由于配置里面hfile最大文件大小设置为1G,所以compaction、split比较频繁,资源消耗比较大,导致gc暂停时间过长,出现写hdfs错误,导致regionsever挂掉
         
        datanode超负荷:

                由于集群运行各种任务,hdfs读写压力大,datanode的负载比较高,导致regionserver写hdfs异常,宕机
         解决办法:1、hfile大小改为100G,禁止系统自己做major compaction。2、给datanode多一些内存,调整rpc线程数。 

猜你喜欢

转载自blog.csdn.net/sujins5288/article/details/85777718
今日推荐