データベースを読み取ることができません、飼育係は、起動に失敗しました

最近、私たちのクラスタ状態異常はZK開始時間は、5秒のハングアップ、故障、ソウ、本当に悲しいですされている黒で開いた場合、毎分が報告され、ハードなぜ開始ZKを理解するように、実際にそれについて考え始めで発見しましたそしてあなたは、このプロセスを判断することができ、時間によって、5S、障害程度で非常に安定したが、完全にこの期間では、開始されていない、せいぜい、それはであるinit状態

[root@ZYC3-AQGK-LJCL-SRV05 deployer]# systemctl status zookeeper
● zookeeper.service - ZooKeeper Service
   Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-03-23 11:29:21 CST; 4s ago
     Docs: http://zookeeper.apache.org
  Process: 31011 ExecStop=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh stop /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
  Process: 31129 ExecStart=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh start /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
 Main PID: 31138 (java)
   CGroup: /system.slice/zookeeper.service
           └─31138 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/zookeeper-prod/bin/../build/classes:/opt/zookeeper/zookeeper-prod/bin/../build/lib/*.jar:/opt/zookeeper/zoo...

Mar 23 11:29:20 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Starting ZooKeeper Service...
Mar 23 11:29:20 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: ZooKeeper JMX enabled by default
Mar 23 11:29:20 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: Using config: /opt/zookeeper/zookeeper-prod/conf/zoo.cfg
Mar 23 11:29:21 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: Starting zookeeper ... STARTED
Mar 23 11:29:21 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Started ZooKeeper Service.
[root@ZYC3-AQGK-LJCL-SRV05 deployer]# systemctl status zookeeper
● zookeeper.service - ZooKeeper Service
   Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-03-23 11:29:21 CST; 5s ago
     Docs: http://zookeeper.apache.org
  Process: 31011 ExecStop=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh stop /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
  Process: 31129 ExecStart=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh start /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
 Main PID: 31138 (java)
   CGroup: /system.slice/zookeeper.service
           └─31138 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/zookeeper-prod/bin/../build/classes:/opt/zookeeper/zookeeper-prod/bin/../build/lib/*.jar:/opt/zookeeper/zoo...

Mar 23 11:29:20 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Starting ZooKeeper Service...
Mar 23 11:29:20 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: ZooKeeper JMX enabled by default
Mar 23 11:29:20 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: Using config: /opt/zookeeper/zookeeper-prod/conf/zoo.cfg
Mar 23 11:29:21 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: Starting zookeeper ... STARTED
Mar 23 11:29:21 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Started ZooKeeper Service.
[root@ZYC3-AQGK-LJCL-SRV05 deployer]# systemctl status zookeeper
● zookeeper.service - ZooKeeper Service
   Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2020-03-23 11:29:26 CST; 706ms ago
     Docs: http://zookeeper.apache.org
  Process: 31225 ExecStop=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh stop /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
  Process: 31129 ExecStart=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh start /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
 Main PID: 31138 (code=exited, status=1/FAILURE)

Mar 23 11:29:20 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: Using config: /opt/zookeeper/zookeeper-prod/conf/zoo.cfg
Mar 23 11:29:21 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31129]: Starting zookeeper ... STARTED
Mar 23 11:29:21 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Started ZooKeeper Service.
Mar 23 11:29:26 ZYC3-AQGK-LJCL-SRV05 systemd[1]: zookeeper.service: main process exited, code=exited, status=1/FAILURE
Mar 23 11:29:26 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31225]: ZooKeeper JMX enabled by default
Mar 23 11:29:26 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31225]: Using config: /opt/zookeeper/zookeeper-prod/conf/zoo.cfg
Mar 23 11:29:26 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31225]: Stopping zookeeper ... /opt/zookeeper/zookeeper-prod/bin/zkServer.sh: 第 182 行:kill: (31138) - 没有那个进程
Mar 23 11:29:26 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[31225]: STOPPED
Mar 23 11:29:26 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Unit zookeeper.service entered failed state.
Mar 23 11:29:26 ZYC3-AQGK-LJCL-SRV05 systemd[1]: zookeeper.service failed.

言及されているログを参照してください。/opt/zookeeper/zookeeper-prod/conf/zoo.cfgそうでないコンテンツ価値の採掘がないかどうかを確認するために、このディレクトリに行きます。結局のところ、confあなたは設定フォルダで、理由があるはず推測することができますlogまたはoutput、特に、実行中のログを保管フォルダなどのerror捜査に関連することができ、この考え方に基づいて、ログイン。
発見後に/opt/zookeeper/zookeeper-prod/binディレクトリが存在するzookeeper.outファイルが、これは実装の詳細で、見て、その後、猫を見て、問題は非常に明確です

2020-03-23 11:36:58,799 [myid:] - INFO  [main:QuorumPeerConfig@136] - Reading configuration from: /opt/zookeeper/zookeeper-prod/bin/../conf/zoo.cfg
2020-03-23 11:36:58,814 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 10.153.115.26 to address: /10.153.115.26
2020-03-23 11:36:58,815 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 10.153.115.25 to address: /10.153.115.25
2020-03-23 11:36:58,816 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 10.153.115.24 to address: /10.153.115.24
2020-03-23 11:36:58,816 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 10.153.115.29 to address: /10.153.115.29
2020-03-23 11:36:58,816 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 10.153.115.28 to address: /10.153.115.28
2020-03-23 11:36:58,816 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - Resolved hostname: 10.153.115.27 to address: /10.153.115.27
2020-03-23 11:36:58,816 [myid:] - WARN  [main:QuorumPeerConfig@354] - Non-optimial configuration, consider an odd number of servers.
2020-03-23 11:36:58,816 [myid:] - INFO  [main:QuorumPeerConfig@398] - Defaulting to majority quorums
2020-03-23 11:36:58,821 [myid:5] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2020-03-23 11:36:58,821 [myid:5] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 24
2020-03-23 11:36:58,822 [myid:5] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2020-03-23 11:36:58,837 [myid:5] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2020-03-23 11:36:58,839 [myid:5] - INFO  [main:QuorumPeerMain@130] - Starting quorum peer
2020-03-23 11:36:58,849 [myid:5] - INFO  [main:ServerCnxnFactory@117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2020-03-23 11:36:58,856 [myid:5] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2020-03-23 11:36:58,861 [myid:5] - INFO  [main:QuorumPeer@1158] - tickTime set to 2000
2020-03-23 11:36:58,861 [myid:5] - INFO  [main:QuorumPeer@1204] - initLimit set to 10
2020-03-23 11:36:58,861 [myid:5] - INFO  [main:QuorumPeer@1178] - minSessionTimeout set to -1
2020-03-23 11:36:58,862 [myid:5] - INFO  [main:QuorumPeer@1189] - maxSessionTimeout set to -1
2020-03-23 11:36:58,871 [myid:5] - INFO  [main:QuorumPeer@1467] - QuorumPeer communication is not secured!
2020-03-23 11:36:58,871 [myid:5] - INFO  [main:QuorumPeer@1496] - quorum.cnxn.threads.size set to 20
2020-03-23 11:36:58,872 [myid:5] - INFO  [main:FileSnap@86] - Reading snapshot /data/zookeeper/data/version-2/snapshot.b91d0000003c
2020-03-23 11:36:59,290 [myid:5] - ERROR [main:QuorumPeer@692] - Unable to load database on disk
java.io.IOException: The accepted epoch, ba86 is less than the current epoch, ba87
    at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:689)
    at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2020-03-23 11:36:59,292 [myid:5] - ERROR [main:QuorumPeerMain@92] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server 
    at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
    at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Caused by: java.io.IOException: The accepted epoch, ba86 is less than the current epoch, ba87
    at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:689)
    ... 4 more

ログの読み取り、その後、ディスク上のデータベースをロードすることができない、エラーの後、スナップショットZKを読んで、とされます!私は、スナップショットを削除しますsnapshot.b91d0000003c、逃げる、スナップショットファイルを再生成するために自分自身を許可しました。

2020-03-23 11:36:58,872 [myid:5] - INFO  [main:FileSnap@86] - Reading snapshot /data/zookeeper/data/version-2/snapshot.b91d0000003c
2020-03-23 11:36:59,290 [myid:5] - ERROR [main:QuorumPeer@692] - Unable to load database on disk

酸クール

[root@ZYC3-AQGK-LJCL-SRV05 deployer]# systemctl status zookeeper
● zookeeper.service - ZooKeeper Service
   Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-03-23 12:12:08 CST; 5min ago
     Docs: http://zookeeper.apache.org
  Process: 25348 ExecStop=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh stop /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
  Process: 25658 ExecStart=/opt/zookeeper/zookeeper-prod/bin/zkServer.sh start /opt/zookeeper/zookeeper-prod/conf/zoo.cfg (code=exited, status=0/SUCCESS)
 Main PID: 25667 (java)
   CGroup: /system.slice/zookeeper.service
           └─25667 java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/zookeeper-prod/bin/../build/classes:/opt/zookeeper/zookeeper-prod/bin/../build/lib/*.jar:/opt/zookeeper/zoo...

Mar 23 12:12:07 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Starting ZooKeeper Service...
Mar 23 12:12:07 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[25658]: ZooKeeper JMX enabled by default
Mar 23 12:12:07 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[25658]: Using config: /opt/zookeeper/zookeeper-prod/conf/zoo.cfg
Mar 23 12:12:08 ZYC3-AQGK-LJCL-SRV05 zkServer.sh[25658]: Starting zookeeper ... STARTED
Mar 23 12:12:08 ZYC3-AQGK-LJCL-SRV05 systemd[1]: Started ZooKeeper Service.

毎回のZKは、スナップショットファイルの実行を持つことになりますので、ディスクがこのホストの前に、輸入ZKは、その後、我々は、デバイスを再起動します、適時にメッセージを書き込むことができませんいっぱいであることから、これは、この時点であるべきである、回復して状態であり、スナップショットファイルの書き込みが失敗したため。しかし、これは幸いにも、我々は他の5つのノードの通常6 ZKノードであり、ではなく、普遍的解決策でなく、クラスタ内で、スナップショットファイルは、フォローがアップデータベースの状態が困難復元する削除したため、特定の問題を分析することである、しかし、したがって、そのような操作が許可されています。
これまでの日にそれを呼び出します:)

おすすめ

転載: blog.51cto.com/yerikyu/2481123
おすすめ