尝试重启节点
Hmaster节点执行
bin/graceful_stop.sh --restart --reload --debug '节点名'
挂掉节点执行
/bin/hbase-daemon.sh start regionserver
两种方式都失败
查看日志
less -R hbase-hadoop-regionserver-xxx.log
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
可以发现与zookeeper有关,查看zookeeper日志
less -R zookeeper.log
Too many connections from... max is 3000
查看zookeeper配置
vim zoo.cfg
/maxClientCnxns
找到
# increase this if you need to handle more clients
maxClientCnxns=2000
maxSessionTimeout=1200000
解决方案,根据需求和集群状况调大maxClientCnxns参数,这个参数为最大的客户端连接数,设为0的的话,最大客户端连接数为无限,慎用,有限制还是比较好些。
单节点轮换重启zookeeper,可以保证不影响服务的情况下重启zookeeper集群
bin/zkServer.sh restart
zookeeper启动结束后,重启regionserver节点
Hmaster节点执行
bin/graceful_stop.sh --restart --reload --debug '节点名'