SQL Server 排查 CPU 占用高的情况。

有朋友叫我帮他看一下数据库,操作系统是Windows 2008 R2,数据库是 SQL 2008 R2 64位。硬件配置还是比较高的,64G内存,16核CPU,他说服务器运行的是金蝶K3软件,数据库实例里有多个数据库。

现象

他说是这几天才出现的,而且在每天的某一个时间段才会出现CPU占用高的情况

内存占用不太高,只占用了30个G

CPU占用100%

排查方向

一般排查都是用下面的脚本,一般会用到三个视图 sys.sysprocesses ,dm_exec_sessions ,dm_exec_requests

1

2

3

4

5

USE master

GO

--如果要指定数据库就把注释去掉

SELECT * FROM sys.[sysprocesses] WHERE [spid]>50 --AND DB_NAME([dbid])="gposdb"

SELECT COUNT(*) FROM [sys].[dm_exec_sessions] WHERE [session_id]>50

看一下当前的数据库用户连接有多少

然后使用下面语句看一下各项指标是否正常,是否有阻塞,这个语句选取了前10个最耗CPU时间的会话

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

SELECT TOP 10

[session_id],

[request_id],

[start_time] AS "开始时间",

[status] AS "状态",

[command] AS "命令",

dest.[text] AS "sql语句",

DB_NAME([database_id]) AS "数据库名",

[blocking_session_id] AS "正在阻塞其他会话的会话ID",

[wait_type] AS "等待资源类型",

[wait_time] AS "等待时间",

[wait_resource] AS "等待的资源",

[reads] AS "物理读次数",

[writes] AS "写次数",

[logical_reads] AS "逻辑读次数",

[row_count] AS "返回结果行数"

FROM sys.[dm_exec_requests] AS der

CROSS APPLY

sys.[dm_exec_sql_text](der.[sql_handle]) AS dest

WHERE [session_id]>50 AND DB_NAME(der.[database_id])="gposdb"

ORDER BY [cpu_time] DESC

如果想看具体的SQL语句可以执行下面的SQL语句,记得在SSMS里选择以文本格式显示结果

1

2

3

4

5

6

7

--在SSMS里选择以文本格式显示结果

SELECT TOP 10

dest.[text] AS "sql语句"

FROM sys.[dm_exec_requests] AS der

CROSS APPLY

sys.[dm_exec_sql_text](der.[sql_handle]) AS dest  WHERE [session_id]>50

ORDER BY [cpu_time] DESC

模拟了一些耗CPU时间的动作

还有查看CPU数和user scheduler数和最大工作线程数,检查worker是否用完也可以排查CPU占用情况

1

2

3

4

--查看CPU数和user scheduler数目

SELECT cpu_count,scheduler_count FROM sys.dm_os_sys_info

--查看最大工作线程数

SELECT max_workers_count FROM sys.dm_os_sys_info

查看机器上的所有schedulers包括user 和system

通过下面语句可以看到worker是否用完,当达到最大线程数的时候就要检查blocking了

对照下面这个表

各种CPU和SQLSERVER版本组合自动配置的最大工作线程数

CPU数   32位计算机      64位计算机

<=4         256                    512

8            288                    576

16           352                    704

32           480                    960

1

2

3

4

5

6

7

8

SELECT

scheduler_address,

scheduler_id,

cpu_id,

status,

current_tasks_count,

current_workers_count,active_workers_count

FROM sys.dm_os_schedulers

如果SQLSERVER存在要等待的资源,那么执行下面语句就会显示出会话中有多少个worker在等待

结合[sys].[dm_os_wait_stats]视图,如果当前SQLSERVER里面没有任何等待资源,那么下面的SQL语句不会显示任何结果

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

SELECT TOP 10

[session_id],

[request_id],

[start_time] AS "开始时间",

[status] AS "状态",

[command] AS "命令",

dest.[text] AS "sql语句",

DB_NAME([database_id]) AS "数据库名",

[blocking_session_id] AS "正在阻塞其他会话的会话ID",

der.[wait_type] AS "等待资源类型",

[wait_time] AS "等待时间",

[wait_resource] AS "等待的资源",

[dows].[waiting_tasks_count] AS "当前正在进行等待的任务数",

[reads] AS "物理读次数",

[writes] AS "写次数",

[logical_reads] AS "逻辑读次数",

[row_count] AS "返回结果行数"

FROM sys.[dm_exec_requests] AS der

INNER JOIN [sys].[dm_os_wait_stats] AS dows

ON der.[wait_type]=[dows].[wait_type]

CROSS APPLY

sys.[dm_exec_sql_text](der.[sql_handle]) AS dest

WHERE [session_id]>50

ORDER BY [cpu_time] DESC

比如我当前执行了查询SalesOrderDetail_test表100次,由于表数据非常多,所以SSMS需要把SQLSERVER执行的结果慢慢的取走,

造成了ASYNC_NETWORK_IO等待

1

2

3

4

USE [AdventureWorks]

GO

SELECT * FROM dbo.[SalesOrderDetail_test]

GO 100

问题源头

经过排查和这几天的观察情况,确定是某些表缺失索引导致,现在在这些表上增加了索引,问题解决

1

2

3

4

5

6

select * from t_AccessControl --权限控制表权限控制

select * from t_GroupAccess --用户组权限表用户组权限

select * from t_GroupAccessType --用户组权限类表用户组权限类

select * from t_ObjectAccess --对象权限表对象权限

select * from t_ObjectAccessType --对象权限类型表对象权限类型

select * from t_ObjectType --对象类型表对象类型

查询CPU占用高的语句

1

2

3

4

5

6

7

8

9

10

11

SELECT TOP 10

total_worker_time/execution_count AS avg_cpu_cost, plan_handle,

execution_count,

(SELECT SUBSTRING(text, statement_start_offset/2 + 1,

(CASE WHEN statement_end_offset = -1

THEN LEN(CONVERT(nvarchar(max), text)) * 2

ELSE statement_end_offset

END - statement_start_offset)/2)

FROM sys.dm_exec_sql_text(sql_handle)) AS query_text

FROM sys.dm_exec_query_stats

ORDER BY [avg_cpu_cost] DESC

查询缺失索引

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

SELECT

DatabaseName = DB_NAME(database_id)

,[Number Indexes Missing] = count(*)

FROM sys.dm_db_missing_index_details

GROUP BY DB_NAME(database_id)

ORDER BY 2 DESC;

SELECT TOP 10

[Total Cost] = ROUND(avg_total_user_cost * avg_user_impact * (user_seeks + user_scans),0)

, avg_user_impact

, TableName = statement

, [EqualityUsage] = equality_columns

, [InequalityUsage] = inequality_columns

, [Include Cloumns] = included_columns

FROM sys.dm_db_missing_index_groups g

INNER JOIN sys.dm_db_missing_index_group_stats s

ON s.group_handle = g.index_group_handle

INNER JOIN sys.dm_db_missing_index_details d

ON d.index_handle = g.index_handle

ORDER BY [Total Cost] DESC;

(查看大图)

定位问题后,新建非聚集索引

1

2

3

4

5

6

7

CREATE NONCLUSTERED INDEX IX_t_AccessControl_F4 ON dbo.t_AccessControl

(

FObjectType

)include([FUserID], [FAccessType], [FAccessMask]) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

drop index IX_t_AccessControl_F4 on t_AccessControl

CPU占用恢复正常

跟踪模板和跟踪文件下载,请使用SQL2008R2 版本:files.cnblogs.com/lyhabc/跟踪模板和trace.rar

总结

从多次历史经验来看,如果CPU负载持续很高,但内存和IO都还好的话,这种情况下,首先想到的一定是索引问题,十有八九错不了。

注意文章开头贴出的客户机器负载情况图

发布了60 篇原创文章 · 获赞 52 · 访问量 11万+

猜你喜欢

转载自blog.csdn.net/yihuliunian/article/details/104696886