MongoDB 数据文件损坏修复救命repair与致命危险

    最近,一客户单实例mongodb数据库,没有备份的情况下遇到了断电导致的数据文件损坏,由于客户业务需要

及数据的不敏感性,要求尽快恢复业务,使用了Mongdb的自动修复repair命令进行修复。可喜的是,帮助用户尽

快恢复了服务,可悲的是在客户可接受情况下相关数据文件内的数据丢失。这里,对这一过程做个总结,同时说明

repair后为什么数据丢失。

  1. 正常的mongodb数据查询

> show dbs;

admin       0.000GB

config      0.000GB

dns_testdb  0.009GB

local       0.000GB

> use dns_testdb

switched to db dns_testdb

> db.test_collection.find();

{ "_id" : ObjectId("5fedd03d9d2569ee04ab62e1"), "name" : "elephant", "user_id" : 0, "boolean" : false, "added_at" : ISODate("2020-12-31T13:21:01.226Z"), "number" : 5129 }

{ "_id" : ObjectId("5fedd03d9d2569ee04ab62e2"), "name" : "dog", "user_id" : 1, "boolean" : false, "added_at" : ISODate("2020-12-31T13:21:01.237Z"), "number" : 9699 }

{ "_id" : ObjectId("5fedd03d9d2569ee04ab62e3"), "name" : "lion", "user_id" : 2, "boolean" : false, "added_at" : ISODate("2020-12-31T13:21:01.238Z"), "number" : 1783 }

Type "it" for more

> 

2.模拟数据文件损坏

[mongo@centos7 dns_testdb]$ du -sh *

28M collection-8--6736947369024546614.wt

9.5M index-9--6736947369024546614.wt

[mongo@centos7 dns_testdb]$ 

[mongo@centos7 dns_testdb]$ 

[mongo@centos7 dns_testdb]$ pwd

/opt/mongo/data/single/dns_testdb

[mongo@centos7 dns_testdb]$ dd if=/dev/null of=/opt/mongo/data/single/dns_testdb/collection-8--6736947369024546614.wt  bs=1024k count=5

0+0 records in

0+0 records out

0 bytes (0 B) copied, 0.000132203 s, 0.0 kB/s

[mongo@centos7 dns_testdb]$

3.重新启动mongodb

> use admin
switched to db admin
> db.shutdownServer();
[mongo@centos7 data]$ mongod --dbpath /opt/mongo/data/single --port 50001  --oplogSize 512  --fork --bind_ip 0.0.0.0 --logpath /opt/mongo/logs/single.log --logappend --journal --directoryperdb --profile=1
about to fork child process, waiting until server is ready for connections.
forked process: 102882
child process started successfully, parent exiting

4.虽然mongodb进程能启动,但是数据文件损坏后的数据集合做数据操作会导致mongod挂掉

[mongo@centos7 data]$ mongo --port 50001

MongoDB shell version v4.2.3

connecting to: mongodb://127.0.0.1:50001/?compressors=disabled&gssapiServiceName=mongodb

Implicit session: session { "id" : UUID("09b6c6aa-059d-4a41-9e0d-e6553966399b") }

MongoDB server version: 4.2.3

Server has startup warnings: 

> show dbs;

admin       0.000GB

config      0.000GB

dns_testdb  0.037GB

local       0.000GB

> use dns_testdb;

switched to db dns_testdb

> db.test_collection.find();

2020-12-31T08:43:45.115-0500 I  NETWORK  [js] DBClientConnection failed to receive message from 127.0.0.1:50001 - HostUnreachable: Connection closed by peer

Error: error doing query: failed: network error while attempting to run command 'find' on host '127.0.0.1:50001' 

2020-12-31T08:43:45.118-0500 I  NETWORK  [js] trying reconnect to 127.0.0.1:50001 failed

2020-12-31T08:43:45.118-0500 I  NETWORK  [js] reconnect 127.0.0.1:50001 failed failed 

> 

5.观察mongodb日志,提示数据文件损坏并建议使用repair进行修复

2020-12-31T08:43:45.103-0500 E  STORAGE  [conn1] WiredTiger error (-31802) [1609422225:103947][102882:0x7f96713b5700], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.open_cursor: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error Raw: [1609422225:103947][102882:0x7f96713b5700], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.open_cursor: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2020-12-31T08:43:45.104-0500 E  STORAGE  [conn1] Failed to open a WiredTiger cursor. Reason: UnknownError: -31802: WT_ERROR: non-specific WiredTiger error, uri: table:dns_testdb/collection-8--6736947369024546614, config: 
2020-12-31T08:43:45.104-0500 E  STORAGE  [conn1] This may be due to data corruption. Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair
2020-12-31T08:43:45.104-0500 F  -        [conn1] Fatal Assertion 50882 at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 101
2020-12-31T08:43:45.104-0500 F  -        [conn1] 

***aborting after fassert() failure

6.按照mongod日志就行修复数据库

[mongo@centos7 data]$ mongod --dbpath /opt/mongo/data/single --port 50001  --oplogSize 512  --fork --bind_ip 0.0.0.0 --logpath /opt/mongo/logs/single.log --logappend --journal --directoryperdb --profile=1 --repair
about to fork child process, waiting until server is ready for connections.
forked process: 102942
child process started successfully, parent exiting
[mongo@centos7 data]$ 

7.修复过程中,mongod日志提示相关损坏的数据集合及索引被重建

2020-12-31T08:44:45.646-0500 I  STORAGE  [initandlisten] repairDatabase dns_testdb
2020-12-31T08:44:45.646-0500 I  STORAGE  [initandlisten] Repairing collection dns_testdb.test_collection
2020-12-31T08:44:45.647-0500 E  STORAGE  [initandlisten] WiredTiger error (-31802) [1609422285:647413][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.verify: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error Raw: [1609422285:647413][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.verify: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2020-12-31T08:44:45.647-0500 I  STORAGE  [initandlisten] Verify failed on uri table:dns_testdb/collection-8--6736947369024546614. Running a salvage operation.
2020-12-31T08:44:45.647-0500 E  STORAGE  [initandlisten] WiredTiger error (-31802) [1609422285:647930][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.salvage: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error Raw: [1609422285:647930][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.salvage: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2020-12-31T08:44:45.648-0500 W  STORAGE  [initandlisten] Salvage failed for uri table:dns_testdb/collection-8--6736947369024546614: Salvage failed: -31802: WT_ERROR: non-specific WiredTiger error. The file will be moved out of the way and a new ident will be created.
2020-12-31T08:44:45.648-0500 W  STORAGE  [initandlisten] Moving data file /opt/mongo/data/single/dns_testdb/collection-8--6736947369024546614.wt to backup as /opt/mongo/data/single/dns_testdb/collection-8--6736947369024546614.wt.corrupt
2020-12-31T08:44:45.648-0500 W  STORAGE  [initandlisten] Rebuilding ident dns_testdb/collection-8--6736947369024546614
2020-12-31T08:44:45.708-0500 I  STORAGE  [initandlisten] Successfully re-created table:dns_testdb/collection-8--6736947369024546614.
2020-12-31T08:44:45.718-0500 I  INDEX    [initandlisten] index build: starting on dns_testdb.test_collection properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "dns_testdb.test_collection" } using method: Foreground
2020-12-31T08:44:45.718-0500 I  INDEX    [initandlisten] build may temporarily use up to 200 megabytes of RAM
2020-12-31T08:44:45.718-0500 I  STORAGE  [initandlisten] Index build initialized: 2ddee833-ea97-4964-98c0-7137e71a99c9: dns_testdb.test_collection: indexes: 1
2020-12-31T08:44:45.722-0500 I  STORAGE  [initandlisten] Index builds manager starting: 2ddee833-ea97-4964-98c0-7137e71a99c9: dns_testdb.test_collection
2020-12-31T08:44:45.724-0500 I  INDEX    [initandlisten] index build: inserted 0 keys from external sorter into index in 0 seconds
2020-12-31T08:44:45.727-0500 I  INDEX    [initandlisten] index build: done building index _id_ on ns dns_testdb.test_collection
2020-12-31T08:44:45.727-0500 I  STORAGE  [initandlisten] Index builds manager completed successfully: 2ddee833-ea97-4964-98c0-7137e71a99c9: dns_testdb.test_collection. Index specs requested: 1. Indexes in catalog before build: 1. Indexes in catalog after build: 1

8.修复后重启mongod服务

[mongo@centos7 data]$ mongod --dbpath /opt/mongo/data/single --port 50001  --oplogSize 512  --fork --bind_ip 0.0.0.0 --logpath /opt/mongo/logs/single.log --logappend --journal --directoryperdb --profile=1 
about to fork child process, waiting until server is ready for connections.
forked process: 102975
child process started successfully, parent exiting
[mongo@centos7 data]$ 

9.mongod服务启动后,服务接受正常的数据查询,但是修复后,发生数据文件损坏的集合数据已经丢失

​[mongo@centos7 data]$ mongo --port 50001
MongoDB shell version v4.2.3
connecting to: mongodb://127.0.0.1:50001/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("d88894c4-16bf-4013-a993-d29e2493fbdf") }
MongoDB server version: 4.2.3
Server has startup warnings: 
> show dbs;
admin       0.000GB
config      0.000GB
dns_testdb  0.000GB
local       0.000GB
> use dns_testdb;
switched to db dns_testdb
> db.test_collection.find();
> 

10.总结

    mongodb数据库修复命令repair,在无备份且发生数据文件损坏的情况下,会导致损坏数据文件相关集合数据全部丢

失,但是修复后不妨碍mongod服务的正常启动。结合修改过程的日志,不难看出,repair对损坏的数据文件及相关集合

的索引文件进行了重建,重建后的数据文件和集合文件被重新初始化,因此数据丢失。所以,使用mongodb数据库,最

好合理配合使用mongodb的副本集做数据冗余安全策略,在使用mongodb副本集的同时还可以做个延迟同步节点防止

误操作。

猜你喜欢

转载自blog.csdn.net/www_xue_xi/article/details/112056757