mongo副本集mongos启动报错

背景

项目采用mongo副本集的形式存储数据,经常出现无故断电导致某个副本mongos启动不起来的问题。

环境介绍

mongodb副本集:
mongo01:192.168.36.218
mongo02:192.168.36.219
mongo03:192.168.36.220

报错信息

mongo03副本上mongos启动不起来,执行命令报错:

[root@localhost ~]# mongos --configdb 192.168.36.218:20000,192.168.36.219:20000,192.168.36.220:20000 --port 30000 --chunkSize 500 --logpath /home/mongo/logs/mongos.log --logappend --fork
about to fork child process, waiting until server is ready for connections.
forked process: 79748
ERROR: child process failed, exited with error number 5

查看mongos.log,错误信息如下:

2018-06-25T09:13:47.607+0800 I CONTROL  [main] ***** SERVER RESTARTED *****
2018-06-25T09:13:47.612+0800 I CONTROL  [main] ** WARNING: You are running this process as the root user, which is not recommended.
2018-06-25T09:13:47.613+0800 I CONTROL  [main] 
2018-06-25T09:13:47.613+0800 I SHARDING [mongosMain] MongoS version 3.2.1 starting: pid=80904 port=30000 64-bit host=MongoDB03 (--help for usage)
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain] db version v3.2.1
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain] git version: a14d55980c2cdc565d4704a7e3ad37e4e535c1b2
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain] allocator: tcmalloc
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain] modules: none
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain] build environment:
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain]     distarch: x86_64
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain]     target_arch: x86_64
2018-06-25T09:13:47.613+0800 I CONTROL  [mongosMain] options: { net: { port: 30000 }, processManagement: { fork: true }, sharding: { chunkSize: 500, configDB: "192.168.36.218:20000,192.168.36.219:20000,192.168.36.220:20000" }, systemLog: { destination: "file", logAppend: true, path: "/home/mongo/logs/mongos.log" } }
2018-06-25T09:13:47.613+0800 I SHARDING [mongosMain] Updating config server connection string to: 192.168.36.218:20000,192.168.36.219:20000,192.168.36.220:20000
2018-06-25T09:13:47.625+0800 W SHARDING [mongosMain] config servers 192.168.36.218:20000 and 192.168.36.220:20000 differ
2018-06-25T09:13:47.627+0800 W SHARDING [mongosMain] config servers 192.168.36.218:20000 and 192.168.36.220:20000 differ
2018-06-25T09:13:47.628+0800 W SHARDING [mongosMain] config servers 192.168.36.218:20000 and 192.168.36.220:20000 differ
2018-06-25T09:13:47.630+0800 W SHARDING [mongosMain] config servers 192.168.36.218:20000 and 192.168.36.220:20000 differ
2018-06-25T09:13:47.630+0800 E SHARDING [mongosMain] Error initializing sharding system: ConfigServersInconsistent hash from 192.168.36.218:20000: { chunks: "d41d8cd98f00b204e9800998ecf8427e", databases: "95954cb16c029767f4ad050712a28f49", shards: "68f4b37fec8c2ac97cc985aa01f37717", version: "b25e55c19a8c75c87b4f950dcf5eb088" } vs hash from 192.168.36.220:20000: {}

如上我们发现:

config servers 192.168.36.218:20000 and 192.168.36.220:20000 differ

配置服务器192.168.36.218:20000和损坏的配置服务器192.168.36.220:20000不一致,也就是说mongo01:20000和mongo03:20000上面的配置不一样

修复

那我们该如何修复呢?其实我们可以将mongo01:20000上的config库导入到mongo03:20000的config库中,以解决上面的问题。

执行过程

第一步:备份mongo01:20000的config库

[root@localhost ~]#  mongodump --host 192.168.36.218:20000 -d config -o /home/config
2018-06-25T10:06:06.225+0800    writing config.actionlog to 
2018-06-25T10:06:06.226+0800    writing config.locks to 
2018-06-25T10:06:06.226+0800    writing config.mongos to 
2018-06-25T10:06:06.226+0800    writing config.lockpings to 
2018-06-25T10:06:06.227+0800    done dumping config.locks (3 documents)
2018-06-25T10:06:06.228+0800    done dumping config.lockpings (2 documents)
2018-06-25T10:06:06.229+0800    done dumping config.mongos (3 documents)
2018-06-25T10:06:06.229+0800    writing config.shards to 
2018-06-25T10:06:06.229+0800    writing config.settings to 
2018-06-25T10:06:06.229+0800    writing config.version to 
2018-06-25T10:06:06.230+0800    done dumping config.shards (1 document)
2018-06-25T10:06:06.230+0800    writing config.databases to 
2018-06-25T10:06:06.230+0800    done dumping config.settings (1 document)
2018-06-25T10:06:06.230+0800    done dumping config.version (1 document)
2018-06-25T10:06:06.230+0800    writing config.changelog to 
2018-06-25T10:06:06.230+0800    writing config.chunks to 
2018-06-25T10:06:06.231+0800    done dumping config.databases (1 document)
2018-06-25T10:06:06.231+0800    writing config.tags to 
2018-06-25T10:06:06.232+0800    done dumping config.chunks (0 documents)
2018-06-25T10:06:06.232+0800    done dumping config.changelog (1 document)
2018-06-25T10:06:06.232+0800    done dumping config.tags (0 documents)
2018-06-25T10:06:06.355+0800    done dumping config.actionlog (8160 documents)
[root@localhost ~]# 

第二步:导入备份的config到mongo03:20000中

[root@localhost ~]#  mongorestore --host 192.168.36.220:20000 -d config /home/config/config
2018-06-25T10:08:09.136+0800    building a list of collections to restore from /home/config/config dir
2018-06-25T10:08:09.162+0800    reading metadata for config.actionlog from /home/config/config/actionlog.metadata.json
2018-06-25T10:08:09.163+0800    reading metadata for config.locks from /home/config/config/locks.metadata.json
2018-06-25T10:08:09.163+0800    reading metadata for config.changelog from /home/config/config/changelog.metadata.json
2018-06-25T10:08:09.164+0800    reading metadata for config.mongos from /home/config/config/mongos.metadata.json
2018-06-25T10:08:09.164+0800    restoring config.locks from /home/config/config/locks.bson
2018-06-25T10:08:09.165+0800    restoring config.mongos from /home/config/config/mongos.bson
2018-06-25T10:08:09.175+0800    error: multiple errors in bulk operation:
  - E11000 duplicate key error collection: config.mongos index: _id_ dup key: { : "MongoDB01:30000" }
  - E11000 duplicate key error collection: config.mongos index: _id_ dup key: { : "MongoDB02:30000" }

2018-06-25T10:08:09.175+0800    restoring indexes for collection config.mongos from metadata
2018-06-25T10:08:09.204+0800    finished restoring config.mongos (3 documents)
2018-06-25T10:08:09.204+0800    restoring indexes for collection config.locks from metadata
2018-06-25T10:08:09.204+0800    reading metadata for config.lockpings from /home/config/config/lockpings.metadata.json
2018-06-25T10:08:09.205+0800    restoring config.lockpings from /home/config/config/lockpings.bson
2018-06-25T10:08:09.215+0800    restoring config.actionlog from /home/config/config/actionlog.bson
2018-06-25T10:08:09.269+0800    restoring config.changelog from /home/config/config/changelog.bson
2018-06-25T10:08:09.284+0800    restoring indexes for collection config.changelog from metadata
2018-06-25T10:08:09.284+0800    finished restoring config.locks (3 documents)
2018-06-25T10:08:09.284+0800    reading metadata for config.shards from /home/config/config/shards.metadata.json
2018-06-25T10:08:09.284+0800    restoring config.shards from /home/config/config/shards.bson
2018-06-25T10:08:09.286+0800    finished restoring config.changelog (1 document)
2018-06-25T10:08:09.286+0800    reading metadata for config.version from /home/config/config/version.metadata.json
2018-06-25T10:08:09.286+0800    restoring config.version from /home/config/config/version.bson
2018-06-25T10:08:09.287+0800    error: multiple errors in bulk operation:
  - E11000 duplicate key error collection: config.lockpings index: _id_ dup key: { : "MongoDB01:30000:1523346496:1804289383" }
  - E11000 duplicate key error collection: config.lockpings index: _id_ dup key: { : "MongoDB02:30000:1529566939:1804289383" }

2018-06-25T10:08:09.287+0800    restoring indexes for collection config.lockpings from metadata
2018-06-25T10:08:09.297+0800    restoring indexes for collection config.shards from metadata
2018-06-25T10:08:09.318+0800    finished restoring config.lockpings (2 documents)
2018-06-25T10:08:09.318+0800    reading metadata for config.databases from /home/config/config/databases.metadata.json
2018-06-25T10:08:09.318+0800    restoring config.databases from /home/config/config/databases.bson
2018-06-25T10:08:09.330+0800    restoring indexes for collection config.version from metadata
2018-06-25T10:08:09.370+0800    finished restoring config.shards (1 document)
2018-06-25T10:08:09.370+0800    reading metadata for config.settings from /home/config/config/settings.metadata.json
2018-06-25T10:08:09.370+0800    restoring config.settings from /home/config/config/settings.bson
2018-06-25T10:08:09.372+0800    finished restoring config.version (1 document)
2018-06-25T10:08:09.372+0800    reading metadata for config.tags from /home/config/config/tags.metadata.json
2018-06-25T10:08:09.372+0800    restoring config.tags from /home/config/config/tags.bson
2018-06-25T10:08:09.374+0800    restoring indexes for collection config.tags from metadata
2018-06-25T10:08:09.382+0800    restoring indexes for collection config.databases from metadata
2018-06-25T10:08:09.400+0800    finished restoring config.tags (0 documents)
2018-06-25T10:08:09.400+0800    reading metadata for config.chunks from /home/config/config/chunks.metadata.json
2018-06-25T10:08:09.400+0800    restoring config.chunks from /home/config/config/chunks.bson
2018-06-25T10:08:09.409+0800    restoring indexes for collection config.settings from metadata
2018-06-25T10:08:09.409+0800    finished restoring config.databases (1 document)
2018-06-25T10:08:09.411+0800    finished restoring config.settings (1 document)
2018-06-25T10:08:09.555+0800    restoring indexes for collection config.chunks from metadata
2018-06-25T10:08:09.581+0800    finished restoring config.chunks (0 documents)
2018-06-25T10:08:09.754+0800    restoring indexes for collection config.actionlog from metadata
2018-06-25T10:08:09.754+0800    finished restoring config.actionlog (8160 documents)
2018-06-25T10:08:09.754+0800    done
[root@localhost ~]#  

注:我是在生产环境中执行的,当时把mongo01:20000和mongo02:20000上面的config库全部备份了一下,然后按顺序全部导入到了mongo03:20000中(备份和导入命令同上)。按理说应该只导入mongo01:20000就可以了,由于是生产环境,我就不加以测试了,等到下次再出现这种问题再试试吧。同时也欢迎大家测试一下,然后回复到评论区,帮助大家解决一下问题。

尝试启动mongo03上的mongos

[root@localhost ~]#  mongos --configdb 192.168.36.218:20000,192.168.36.219:20000,192.168.36.220:20000 --port 30000 --chunkSize 500 --logpath /home/mongo/logs/mongos.log --logappend --fork
about to fork child process, waiting until server is ready for connections.
forked process: 4286
child process started successfully, parent exiting
[root@localhost ~]# ps -ef | grep mongos
root     44760     1  0 Jun21 ?        00:25:14 mongos --configdb 192.168.36.218:20000,192.168.36.219:20000,192.168.36.220:20000 --port 30000 --chunkSize 500 --logpath /home/mongo/logs/mongos.log --logappend --fork
root     66128 66090  0 11:40 pts/0    00:00:00 grep mongos
[root@localhost ~]# 

如上,我们可以看到mongos成功启动并在后台保持运行。

猜你喜欢

转载自blog.51cto.com/10074802/2132394