[Essence] SQL SERVER - CPU problem location and solution

 
 
The basic process of CPU problem location:
 
Performance Counter Diagnostics

Mainly used performance counters

  1. %Process Time full instance (mainly used to view the CPU status of the current server)
  2. %Process Time sqlservr (mainly used to view the CPU usage of the database)
 

Step 1. Exclude apps from affecting the CPU

    Combining these two counters at the same time point can diagnose whether the CPU is consumed by other applications on the server. As shown in the figure, the "%Process Time full instance (red line)" at about 17:10 suddenly increases, and the SQL service (green line) does not increase significantly, which means that the CPU pressure during this time period is not caused by the database!

When the red line is obviously raised, because I did a file compression on the server where the database is located! Operations like file compression can use a lot of CPU and impact database performance!

 

 

Step 2. CPU problem location

 

Continuously high CPU during peak hours

 

 

CPU fluctuates regularly

 

 

CPU suddenly soars

  

       At 9 points in the figure, the CPU soared from an average of 20 to 100%

 

 

Step 3. CPU problem analysis and resolution (general steps)

 

 First of all, it is clear that 90% of the problems may be concentrated in 10% of the scenarios. In such a situation where the CPU continues to be high, please pay attention to the following two points:

  1. Is your database parallelism tuned?
  2. Is your database missing indexes, causing frequent queries to consume high CPU resources? 

 

Parallelism and Parallelism Threshold  

    What is the maximum parallelism? It can be simply understood as the maximum number of CPUs that can be used to execute a statement. It seems of course that the more you use, the better, and the more statements you use, the faster it will be! The answer is capitalized " NO ". Using too much CPU will cause the threads to work together for a long time, which directly leads to very slow statements, and consumes a lot of CPU time, resulting in high CPU usage, which in turn becomes a bottleneck!

  Looking at the duration of a data statement is the execution time, but looking at the CPU time, that is, without setting the degree of parallelism, a parallel plan will generate a lot of CPU consumption, and will make the statement execution slower!    

    那么是不是使用的越少越好呢?任何事情没有绝对的,视情况而定,如果系统有比较大数据量的操作需求,并行使用多个CPU会有很大的提升。

一般建议系统如果超过32个CPU 那么设置成8或者4,如果系统中都是特别短小且频繁的语句建议设置成1(取消语句并行,要慎重真的符合你的场景才好)

注:很多时候并行度设置和你的服务器CPU配置有关,比如有几路、几核、是否超线程,一般来说不要跨物理CPU为好。并行度的设置是针对实例级别的设置(2016中可以对单独数据库设置)

     并行开销的阀值,主要控制SQL优化器何时选用并行计划,建议默认值,此值设置的越小优化器越容易选择并行计划。

    怎么设置并行度和阀值,请看下图: 系统默认的并行度 为0,阀值默认为5

 

 

语句导致CPU高

 

    语句导致CPU高也是很常见的问题之一,那么语句怎么调优降低CPU 消耗呢? 这里只做一些简单的说明,具体的语句调优、参数化减少语句编译,请看后面的系列文章。

语句调优的方式很多种,这里介绍和CPU相关最为常用:

  1. 添加索引降低语句开销,执行需要的资源消耗少了消耗的CPU 自然相对就少了。
  2. 降低语句复杂度,让SQL Server执行高效(同样也是降低资源消耗的方法)。
  3. 分析语句是否可以采用串行计划。
  4. 前端程序尽量参数化减少语句的编译消耗。
 

 

步骤4.CPU 问题分析与解决(特殊排查步骤)

 
持续很高

    持续很高很可能是由于几条不优化语句频繁运行,或大面积不优化语句运行。处理此类问题一般需要分阶段处理。

 

通用步骤中调整参数,大量创建缺失索引后。重新收集分析。如果依然持续很高就要跟踪系统高峰时段具体运行的语句,降低语句的资源消耗。并同步分析压力的来源是否仍然有大量不优化的语句,或是cpu真的不能支撑业务(参见cpu真高)。
 
 
规律波动


如果你是系统维护人员,看到类似这样的CPU数据指标,如果你还不能有一些思路,请你好好熟悉下你的系统。

    这张图很清晰地反映出系统每半小时一次的CPU升高,那么别忙着去找对应时间点的语句,我们最少要好好想一下,系统中有什么操作半小时执行一直?SQL JOB?计划任务?前台定时处理?等等等

    这个规律的定时处理是否有异常?是否最近有什么改动?执行的结果是不是和你想的一样?

    也许问题就这么清晰的定位了......

 
突然彪高


CPU突然飙高可能是偶然的现象,也许你可以认为没有关系,但当你判断为偶然之前,请做过下面的分析:

  1. 彪高的时间点运行什么语句,是否异常
  2. 分析过系统日志,CPU飙高时间点是否有异常
  3. 检查服务器上有什么特殊应用
  4. 检查了数据库状态
  5. 马上开启监控为下一次突发情况的到来做好准备

    排除上述异常,最有可能的原因就是数据库中,在那一刻有一个或多个语句运行异常,或非常不优化。如果这情况真的因为语句问题,而且只出现一次,那么这可能不是问题,我们尽量找到当时的语句,查看问题。

找到对应的时间点看看到底是什么语句在运行

 

    对这条语句进行分析到底是为什么!

CPU 真高

    经过各种分析优化,如果依然CPU压力明显,真心是硬件不能支撑业务了,那么我们就要选择更高大上的方式了,比如修改程序设计垂直/水平拆分,添加硬件,读写分离分担压力,组建集群负载均衡等等手段......

 

-----------------------------------------------------------------------------------------------------

  总结:对于CPU压力的解决,大部分的用户可以通过调整并行度和系统语句的优化来解决。

      另外对系统的监控和分析在诊断问题及解决问题中起到至关重要的作用。

      在下结论前一定要经过仔细的分析研究,一个想当然的决定可能造成严重的影响。

     你的系统真的需要加硬件,或高大上的方案么?     

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326477663&siteId=291194637