Database monitoring of MongoDB monitoring

Second, the database level

2.1 db.serverStatus()

1. Lock information monitoring

rs0:PRIMARY> db.serverStatus().globalLock
{
	"totalTime" : NumberLong("2651301900000"),      //自上次发生lock以来的时间
	"currentQueue" : {          //锁等待队列信息
		"total" : 0,            //因为锁而产生的排队的总数
		"readers" : 0,          //等待读锁而产生的排队数(kQueuedReader)
		"writers" : 0           //等待写锁而产生的排队数(kQueuedWriter)
	},
	"activeClients" : {         //活跃连接数信息
		"total" : 38,           //当前活跃连接数
		"readers" : 0,          //当前执行读操作的活跃连接数(kActiveReader)
		"writers" : 0           //当前执行写操作的活跃连接数(kActiveWriter)
	}
}

MongoDB lock: ("+" means compatibility, "-" means mutual exclusion)

Lock mode MODE_NONE MODE_IS MODE_IX MODE_S MODE_X
MODE_NONE + + + + +
MODE_IS + + + - -
MODE_IX + + + + -
MODE_S + + - + -
MODE_X + - - - -

MongoDB is a hierarchical management method when locking:

globalLock --> DBlock --> CollectionLock。

MongoDB wiredtiger is a document-level lock concurrency. When reading and writing concurrently, the specific lock amount is implemented as follows:

写操作

    1. globalLock  (这一层只关注是读还是写,不关注具体是什么LOCK)
    2. DBLock MODE_IX
    3. Colleciotn MODE_IX
    4. pass request to wiredtiger

读操作
    1. globalLock MODE_IS  (这一层只关注是读还是写,不关注具体是什么LOCK)
    2. DBLock MODE_IS
    3. Colleciton MODE_IS
    4. pass request to wiredtiger

The overall process is as follows:

1.Client发送请求至MongoDB
2.判断Client状态是kQueuedReader或kQueuedWriter
2.获取ticket(globalLock完成)
    正常情况下,如果有没出现锁竞争,所有读写请求都会被pass到存储引擎层
    为了限制存储引擎层并发度,可以设置ticket这个值
    wiredtiger默认限制传递到引擎层面的最大读写并发数均为128
    mmapv1没有ticket的限制
3.Client状态转换为kActiveReader或kActiveWriter
    如果该参数长时间不为0,说明服务现在并发较大,负载较高
    可以考虑SQL优化、升配来处理
4.lockBegin
    加DB、Collection等层次锁
    更底层的锁竞争会间接影响到globalLock

to sum up:

    serverStatus.globalLock or mongostat (qr|qw ar|aw indicators) can view the various indicators of mongod globalLock.

    Wiredtiger limits the maximum number of concurrent reads and writes passed to the engine level to 128 (a reasonable empirical value, usually without adjustment). If this threshold is exceeded, the queued requests will be reflected in globalLock.currentQueue.readers/writers.

    If the value of globalLock.currentQueue.readers/writers is not 0 for a long time (at this time globalLock.activeClients.readers/writers must be continuously close to or equal to 128), this also indicates that your system concurrency is too high, or for a long time Requests that occupy mutex locks, such as indexing in the foreground, can be optimized by optimizing the processing time of a single request (for example, indexing to reduce COLLSCAN or SORT), or by upgrading back-end resources (memory, disk IO capabilities, and CPU).

    globalLock.activeClients.readers/writers continues to be non-zero (but does not reach 128, at this time the currentQueue is empty), and you think the request processing is already very slow, then you can also consider looking for specific slow queries in the above and optimize them. Or upgrade resources.

2. Connection information monitoring

rs0: PRIMARY > db.serverStatus().connections {
    "current": 5,                       //当前连接数
    "available": 814,                   //剩余可以连接数
    "totalCreated": NumberLong(186)     //截止到现在创建连接数
}

3. Memory information monitoring

rs0:PRIMARY> db.serverStatus().mem
{
	"bits" : 64,                    //64位
	"resident" : 245,               //物理内存消耗
	"virtual" : 1262,               //虚拟内存消耗
	"supported" : true,             //支持显示额外内存信息
	"mapped" : 0,                   //映射内存
	"mappedWithJournal" : 0         //除了映射内存外还包括journal日志消耗的映射内存
}

4. Error information monitoring

rs0: PRIMARY > db.serverStatus().asserts {
    "regular": 0,           //服务启动后asserts错误个数
    "warning": 0,           //服务启动后warning个数
    "msg": 0,               //服务启动后message asserts个数
    "user": 22,             //服务启动后user asserts格式
    "rollovers": 0          //服务启动后重置次数
}	

5. Network traffic monitoring

rs0:PRIMARY> db.serverStatus().network
{
	"bytesIn" : NumberLong(1013083142),     //网络入流量
	"bytesOut" : NumberLong(1123552013),    //网络处流量
	"numRequests" : NumberLong(3592562)     //累积请求数
}

2、db.stats()

rs0:PRIMARY> db.stats()
{
	"db" : "test",                          //数据库名
	"collections" : 5,                      //数据库中集合数
	"objects" : 139,                        //数据库预估数据行
	"avgObjSize" : 63.65467625899281,       //平均每行数据大小,单位为bytes
	"dataSize" : 8848,                      //当前数据库数据大小,单位为bytes
	"storageSize" : 1077248,                //当前数据库物理存储大小,单位为bytes
	"numExtents" : 5,                      
	"indexes" : 2,                          
	"indexSize" : 16352,                    //索引空间大小,单位为bytes
	"fileSize" : 67108864,                  //数据库预分配文件大小
	"nsSizeMB" : 16,
	"extentFreeList" : {
		"num" : 1,
		"totalSize" : 32768
	},
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 22
	},
	"ok" : 1
}

3. View the current active session

3.1 db.currentOp()

在会话1执行db.fsyncLock()
在会话2执行db.cc.insert({"name":"aa"})
在会话3执行db.currentOp()

> db.currentOp()
{
	"inprog" : [
		{
			"desc" : "conn10",
			"threadId" : "0x3fdf860",
			"connectionId" : 10,
			"opid" : 380692,                        //db.killOp使用的就是该opid
			"active" : true,                        //是否活跃
			"secs_running" : 4,                     //执行时间(秒)
			"microsecs_running" : NumberLong(4603324),
			"op" : "insert",                        //执行操作类型
			"ns" : "test.cc",                       //执行操作数据库
			"insert" : {                            //执行操作语句
				"_id" : ObjectId("5bee323020e268b4d947a580"),
				"name" : "aa"
			},
			"client" : "127.0.0.1:42066",           //执行操作客户端
			"numYields" : 0,
			"locks" : {                             //执行操作需要持有锁
				"Global" : "w"
			},
			"waitingForLock" : true,                //是否锁等待 ?
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(1),
						"w" : NumberLong(1)
					},
					"acquireWaitCount" : {
						"w" : NumberLong(1)
					},
					"timeAcquiringMicros" : {
						"w" : NumberLong(22503637)
					}
				}
			}
		}
	],
	"fsyncLock" : true,                             //是否全局锁定数据库
	"info" : "use db.fsyncUnlock() to terminate the fsync write/snapshot lock"
}

It is worth noting that:

1. After disconnecting the MongoDB Shell, the connection will be closed, but the thread of the connection request does not end. The thread will not exit until the command is executed and the thread returns the result to the client when the connection is found to be closed.

2. MongoDB does not end the request immediately after sending killOp

Only when the corresponding service thread of the connection stores the killPending field in the code logic, the code will continue to call this parameter to check and determine the state of killPending. After sending killOp, the request will be executed to the next [checkpoint], and the current session will be killed only after it is judged that killPending=1.

3.2 Kill slow sessions

> db.killOp(380692)
{ "info" : "attempting to kill op" }

The realization principle of db.killOp(opid) is as follows:

    The service thread corresponding to each connection stores a killPending field. When killOp is sent, the field will be set to 1. During the execution of the request, you can continuously call OperationContext::checkForInterrupt() to check whether killPending is set. If it is set, the thread exits.

    To support killOp for a request, checkForInterrupt() must be added to the processing logic of the request. Otherwise, even if killOp is sent, the thread can only exit after the request is completely processed.

Guess you like

Origin blog.csdn.net/weixin_37692493/article/details/113757888