stress put pressure on case studies --cpu, io, memory

stress analysis of pressure command


 

A, stress --cpu 1 --timeout 600 analyzes the phenomenon? Why such a high load? top command to view the user process consumes cpu too high (stress process consumes)

 

Analysis of the phenomenon can be seen that the load is high, cpu usage of user mode is 100%, stress process used cpu is also close to 100%.

Question: Why load close to 1? ?

# Vmstat 1 View monitoring information

Load = r + b, which is a momentary value.

The chart below shows r + b is 1, where the load is 1.

Here the load is not the reason for 2, there is only one core cpu at work, and only one process consumes cpu, so here is a load, is not 2.

 

Two, stress -i 1 --timeout 600 analyzes the phenomenon? see top load increases, core cpu too high? iostat -x 3 stress consume more than cpu, iowait wait pidstat -d      

 

Under normal circumstances, iowait here is data, not 0, it should be up, probably because the operating system version, or use stress-ng version of the enhanced version

Download: https://kernel.ubuntu.com/~cking/tarballs/stress-ng/

And stress the same installation steps need to compile and install.

High version of the stress-ng Compile need higher version of gcc, I was here before version 4.4.7

gcc Download: http://www.gnu.org/prep/ftp.html

Analysis steps:

top can see iowait, so there may be a result of waiting for io.

1、 iostat -x  3   

By iostat to see how busy the disk, this data should be more than 20%, we can see the disk busy, as well as the process of reading and writing, I have here the operating system version is too low, so the data is very small.

这里写操作也比较多,怀疑应该每秒写600多,所以磁盘繁忙是大量的写导致的,所以下面使用pidstat再分析。

2、pidstat -d 3  查看进程读、进程写、和进程延迟

这里iodelay,这里每次都有java(tomcat)和stress。

这里的写操作比较多,

 

 

三、stress -c 8 --timeout 600  

 

现象:用户cpu已经打满了,负载上升很快,并且很快就到8了,每个进程所占的cpu资源是12%多,就是这8个stress把cpu打满了。

 vmstat 1

负载=r+b =8

pidstat 3

这里的%wait是等待cpu临幸它所消耗的一个百分比,百分比越高,等待排队的时间越长,和iowait不同。

这里8个进程在抢占cpu,中断和上下文切换会高些。

vmstat 3  查看中断和上下文切换

这里cs应该达到几十万以上,我这里数据不对,

 

 案例:有次压测用的是4核的一个cpu,用20个线程去压,cpu就打满了,到100%

一个tomcat写的java进程,20个并发的是tps大概是90多,30个并发tps是80多,80个并发的时候tps就是70多。

cpu都是打满的,随着并发数的时候,响应时间不断增加,tps不断减小。

什么原因???

响应时间增加怀疑cpu上下文切换导致的线程等待时间比较长,

tomcat打印tomcat整理处理时间,再打印一个接口的一个处理时间,接口处理时间从100ms增加到200ms,但是tomcat的处理时间从1s增加到8s。

随着并发数的增加,tomcat线程池的排队时间从1s增加到8s多,时间耗在哪里了呢,时间耗在了线程的上下文切换上了。

 

四、sysbench --threads=10 --max-time=300 threads run

 

现象:负载很高,大部分还在内核态cpu,看看谁在用内核态cpu?iowait没有,中断没有,也不是虚拟化??那是谁把内核cpu打满了??

最有可能是上下文切换,进程之间上下文切换导致的内核态cpu比较高。

 # vmstat 1

可以看到图中cs上下文切换真他妈好高,达到百万级别了。

#pidstat -w  看上下文切换,但是这里啥也看不出来。为啥看不出来呢???

因为???

cswch自愿上下文切换:进程无法获取资源导致的上下文切换,比如;I/O,内存资源等系统资源不足,就会发生自愿上下文切换。

nvcswch非自愿上下文切换:进程由于时间片已到,被系统强制调度,进而发生的上下文切换 ,比如大量进程抢占cpu。

 

 

python脚本运行分析


 

五、app.py

 

六、iolatency.py

 

Guess you like

Origin www.cnblogs.com/wuzm/p/11281621.html