Server performance linux commands

1.1 cpu performance view

1. Check the number of physical CPUs:

cat /proc/cpuinfo |grep "physical id"|sort|uniq|wc -l

2. View the number of cores in each physical cpu:

cat /proc/cpuinfo |grep "cpu cores"|wc -l

3. The number of logical cpu:

cat /proc/cpuinfo |grep "processor"|wc -l

Number of physical cpu * number of cores = number of logical cpu (in the case of not supporting hyper-threading technology)

1.2 Memory View

1. Check the memory usage:

#free -m

total used free shared buffers cached

Mem: 3949 2519 1430 0 189 1619

-/+ buffers/cache: 710 3239

Swap: 3576 0 3576


total:内存总数

used:已经使用的内存数

free:空闲内存数

shared:多个进程共享的内存总额

- buffers/cache:(已用)的内存数,即used-buffers-cached

+ buffers/cache:(可用)的内存数,即free+buffers+cached


Buffer Cache用于针对磁盘块的读写;

Page Cache用于针对文件inode的读写,这些Cache能有效地缩短I/O系统调用的时间。



对操作系统来说free/used是系统可用/占用的内存;

对应用程序来说-/+ buffers/cache是可用/占用内存,因为buffers/cache很快就会被使用。

When we work, we should look at it from an application perspective.

1.3 Hard Disk View

1. View hard disk and partition information:

fdisk -l

2. Check the disk space occupation of the file system:

df -h

3. Check the I/O performance of the hard disk (displayed once every second and displayed 5 times):

iostat -x 1 5

Iostat is included in the package systat, you can use yum -y install systat to install.

Frequently concerned parameters:

  1. 如%util接近100%,说明产生的I/O请求太多,I/O系统已经满负荷,该磁盘可能存在瓶颈。

  2. 如idle小于70%,I/O的压力就比较大了,说明读取进程中有较多的wait。

4. Check the size of a directory in the linux system:

du -sh /root

If you find that the space of a certain partition is nearly used up, you can enter the mount point of the partition, use the following command to find the file or directory that takes up the most space, and then find out the front that takes up the most space in the system in descending order 10 files or directories:

du -cksh *|sort -rn|head -n 10

1.4 View average load

Sometimes the system responds very slowly, but the reason is not found, then it is necessary to check the average load to see if it has a large number of processes waiting in line.

load average: average load

Load average refers to the average utilization of the system's run queue, and can also be considered as the average number of runnable processes .

Taking road conditions as an example, the conditions of a single-core CPU and a single lane are as follows:

image

  • A number between 0.00-1.00 indicates that the road conditions are very good at this time, there is no congestion, and vehicles can pass without hindrance.
  • 1.00 means the road is normal, but it may deteriorate and cause congestion. At this time, the system has no redundant resources, and the administrator needs to optimize.
  • 1.00-*** means that the road conditions are not so good. If it reaches 2.00, it means that there are twice as many vehicles on the bridge waiting. You must check this case.

Multi-core CPU-the multi-lane situation is as follows:

image

In the case of a multi-core CPU, the number in the full load state is "1.00 * CPU core number", that is, the dual-core CPU is 2.00 and the quad-core CPU is 4.00. 

General processes need to consume resources such as CPU, memory, disk I/O, network I/O, etc. In this case, load average does not refer to CPU usage alone. That is, factors such as memory, disk, and network can also affect the average load value of the system. 
In a single-core processor, when the average load value is 1 or less, the system processing process will be very easy, that is, the load is very low. When it reaches 3, it will appear to be very busy, and when it reaches 5 or 8, it will not be able to process well (5 and 8 are still controversial thresholds, for the sake of conservativeness, it is recommended to choose low).

View load average data

The load average can be seen in the following commands

# top 
# uptime 
# w

Screenshot below:

top command

image

uptime command

image

w command

image

The three values ​​of load average here refer to the average load value of the system in the last 1/5/15 minutes.

According to experience: we should focus on the average load of 5/15 minutes, because the average load of 1 minute is too frequent, and the high concurrency in a moment will cause the value to change drastically.

1.5 vmstat command to determine whether the system is busy

You can also use the vmstat command to determine whether the system is busy, where:

procs

r:等待运行的进程数。

b:处在非中断睡眠状态的进程数。

w:被交换出去的可运行的进程数。

memeory

swpd:虚拟内存使用情况,单位为KB。

free:空闲的内存,单位为KB。

buff:被用来作为缓存的内存数,单位为KB。

swap

si:从磁盘交换到内存的交换页数量,单位为KB。

so:从内存交换到磁盘的交换页数量,单位为KB。

io

bi:发送到块设备的块数,单位为KB。

bo:从块设备接受的块数,单位为KB。

system

in:每秒的中断数,包括时钟中断。

cs:每秒的环境切换次数。

cpu

按cpu的总使用百分比来显示。

us:cpu使用时间。

sy:cpu系统使用时间。

id:闲置时间。

1.6 Under Linux, you can use the nethogs tool to view process traffic

1.7 Other parameters

查看内核版本号:

uname -a

简化命令:uname -r

查看系统是32位还是64位的:

file /sbin/init

查看发行版:

cat /etc/issue

或lsb_release -a

查看系统已载入的相关模块:

lsmod

查看pci设置:

lspci

2.1.3 System performance analysis tool

1. Common system commands

Vmstat、sar、iostat、netstat、free、ps、top等

2. Commonly used combinations

  1. vmstat、sar、iostat检测是否是CPU瓶颈

  2. free、vmstat检测是否是内存瓶颈

  3. iostat检测是否是磁盘I/O瓶颈

  4. netstat检测是否是网络带宽瓶颈

2.1.4 Linux performance evaluation and optimization

Overall system performance evaluation (uptime command)
uptime

16:38:00 up 118 days, 3:01, 5 users,load average: 1.22, 1.02, 0.91

note:

  • The load average three-value size generally cannot be greater than the number of system CPUs.

    The system has 8 CPUs. If the load average value is greater than 8 for a long time, it indicates that the CPU is very busy and the load is high, which may affect system performance.

  • But occasionally greater than 8, generally does not affect system performance.

  • If the load average output value is less than the number of CPUs, it means that the CPU has idle time slices, such as the output in this example, the CPU is very idle

2.2.1 CPU performance evaluation

1. Use vmstat command to monitor system CPU

Display brief information about the performance of various resources in the system, mainly depending on the CPU load.

The following is the output of the vmstat command on a certain system:

[root@node1 ~]#vmstat 2 3

procs

———–memory———- —swap– —–io—- –system– —–cpu——


r b swpd freebuff cache si so bi bo incs us sy idwa st


0 0 0 162240 8304 67032 0 0 13 21 1007 23 0 1 98 0 0


0 0 0 162240 8304 67032 0 0 1 0 1010 20 0 1 100 0 0


0 0 0 162240 8304 67032 0 0 1 1 1009 18 0 1 99 0 0

Procs

r--The number of processes running and waiting for the cpu time slice. If this value is longer than the number of system CPUs for a long time, it means that the CPU is insufficient and the CPU needs to be increased

b--The number of processes waiting for resources, such as waiting for I/O, or memory swapping.

CPU

us

The percentage of CPU time consumed by user processes.
When the value of us is relatively high, it means that the user process consumes more cpu time, but if it is greater than 50% for a long time, you need to consider optimization procedures or algorithms.

his

The percentage of CPU time consumed by the kernel process. When the value of Sy is high, it indicates that the kernel consumes a lot of CPU resources.

According to experience, the reference value of us+sy is 80%. If us+sy is greater than 80%, it means that there may be insufficient CPU resources.

2. Use the sar command to monitor the system CPU

sar performs separate statistics on each aspect of the system, but it will increase the system overhead, but the overhead can be evaluated and will not have a great impact on the system's statistical results.

The following is the CPU statistics output of the sar command for a certain system:

[root@webserver ~]# sar -u 3 5


Linux

2.6.9-42.ELsmp (webserver) 11/28/2008_i686_

(8 CPU)


11:41:24

AM CPU %user %nice%system

%iowait %steal %idle


11:41:27

AM all 0.88 0.00 0.29 0.00 0.00 98.83


11:41:30

AM all 0.13 0.00 0.17 0.21 0.00 99.50


11:41:33

AM all 0.04 0.00 0.04 0.00 0.00 99.92


11:41:36

AM all 90.08 0.00 0.13 0.16 0.00 9.63


11:41:39

AM all 0.38 0.00 0.17 0.04 0.00 99.41


Average:

all 0.34 0.00 0.16 0.05 0.00 99.45

The output is explained as follows:

  1. %user列显示了用户进程消耗的CPU 时间百分比。

  2. %nice列显示了运行正常进程所消耗的CPU 时间百分比。

  3. %system列显示了系统进程消耗的CPU时间百分比。

  4. %iowait列显示了IO等待所占用的CPU时间百分比

  5. %steal列显示了在内存相对紧张的环境下pagein强制对不同的页面进行的steal操作 。

  6. %idle列显示了CPU处在空闲状态的时间百分比。

  7. 问题

Have you ever encountered the phenomenon that the overall system CPU utilization is not high and the application is slow?

In a multi-CPU system, if the program uses a single thread, there will be such a phenomenon. The overall CPU utilization rate is not high, but the system application responds slowly. This may be due to the single thread being used by the program. Using one CPU causes the CPU occupancy rate to be 100%, unable to process other requests, while other CPUs are idle, which leads to low overall CPU usage and slow applications.

2.3.1 Memory performance evaluation

1. Use the free command to monitor memory

free is the most commonly used command to monitor Linux memory usage, see the following output:

[root@webserver ~]# free -m


total

used freeshared

buffers cached


Mem:

8111 7185 926 0 243 6299


-/+

buffers/cache:

643 7468


Swap:

8189 0 8189

Empirical formula:

  1. 应用程序可用内存/系统物理内存>70%,表示系统内存资源非常充足,不影响系统性能;

  2. 应用程序可用内存/系统物理内存<20%,表示系统内存资源紧缺,需要增加系统内存;

  3. 20%<应用程序可用内存/系统物理内存<70%,表示系统内存资源基本能满足应用需求,暂时不影响系统性能

2. Use vmstat command to monitor memory

[root@node1

~]#

vmstat 2 3


procs

———–memory———- —swap– —–io—- –system– —–cpu——


r b swpd freebuff cache si so bi bo incs us sy idwa st


0 0 0 162240 8304 67032 0 0 13 21 1007 23 0 1 98 0 0


0 0 0 162240 8304 67032 0 0 1 0 1010 20 0 1 100 0 0


0 0 0 162240 8304 67032 0 0 1 1 1009 18 0 1 99 0 0

memory

  1. swpd--切换到内存交换区的内存数量(k为单位)。如swpd值偶尔非0,不影响系统性能

  2. free--当前空闲的物理内存数量(k为单位)

  3. buff--buffers cache的内存数量,一般对块设备的读写才需要缓冲

  4. cache--page cached的内存数量

Generally, as a file system cached, frequently accessed files will be cached. If the cache value is larger, it means that there are more files in the cached. If the bi in IO is smaller at this time, it means that the file system efficiency is better.

swap

  1. si--由磁盘调入内存,也就是内存进入内存交换区的数量。

  2. so--由内存调入磁盘,也就是内存交换区进入内存的数量。

The values ​​of si and so are not 0 for a long time, indicating that the system memory is insufficient. Need to increase system memory.

2.4.1 Disk I/O performance evaluation

1. Basics of Disk Storage

Frequently accessed files or data use memory read and write instead of direct disk I/O as much as possible , which is a thousand times more efficient.

Separate the frequently read and write files from the long-term unchanged files and place them on different disk devices.

For data that is frequently written, you can consider using raw devices instead of file systems.

Advantages of bare equipment:

  1. 数据可直接读写,不需经过操作系统级缓存,节省内存资源,避免内存资源争用;

  2. 避免文件系统级维护开销,如文件系统需维护超级块、I-node等;

  3. 避免了操作系统cache预读功能,减少了I/O请求

The disadvantages of using raw equipment are:

Data management and space management are not flexible and require very professional personnel to operate.

2. Use iostat to evaluate disk performance

[root@webserver ~]# iostat -d 2 3

Linux

2.6.9-42.ELsmp (webserver) 12/01/2008_i686_

(8 CPU)

Device:

tps Blk_read/sBlk_wrtn/sBlk_read

Blk_wrtn

sda 1.87 2.58 114.12 6479462 286537372

Device:

tps Blk_read/sBlk_wrtn/sBlk_read

Blk_wrtn

sda

0.00 0.00 0.00 0 0

Device:

tps Blk_read/sBlk_wrtn/sBlk_read

Blk_wrtn

sda

1.00 0.00 12.00 0 24

The explanation is as follows:

  1. Blk_read/s--每秒读取数据块数

  2. Blk_wrtn/s--每秒写入数据块数

  3. Blk_read--读取的所有块数

  4. Blk_wrtn--写入的所有块数

You can get a basic understanding of the read and write performance of the disk through the values ​​of Blk_read/s and Blk_wrtn/s.
If the value of Blk_wrtn/s is large, it means that disk write operations are frequent. Consider optimizing the disk or program. For
example , if the value of Blk_read/s is large, Indicates that there are many direct disk read operations, and the read data can be put into memory

The rules follow:

Long-term, large data read and write is definitely abnormal, and this situation will definitely affect system performance.

3. Use sar to evaluate disk performance

Through the "sar -d" combination, you can make a basic statistics on the disk IO of the system. Please see the following output:

[root@webserver ~]# sar -d 2 3


Linux

2.6.9-42.ELsmp (webserver) 11/30/2008_i686_

(8 CPU)


11:09:33

PM DEV tps rd_sec/swr_sec/savgrq-sz

avgqu-sz await svctm %util


11:09:35

PM dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00


11:09:35

PM DEV tps rd_sec/swr_sec/savgrq-sz

avgqu-sz await svctm %util


11:09:37

PM dev8-0 1.00 0.00 12.00 12.00 0.00 0.00 0.00 0.00


11:09:37

PM DEV tps rd_sec/swr_sec/savgrq-sz

avgqu-sz await svctm %util


11:09:39

PM dev8-0 1.99 0.00 47.76 24.00 0.00 0.50 0.25 0.05


Average:

DEV tps rd_sec/swr_sec/savgrq-sz

avgqu-sz await svctm %util


Average:

dev8-0 1.00 0.00 19.97 20.00 0.00 0.33 0.17 0.02

Parameter meaning:

  1. await--平均每次设备I/O操作等待时间(毫秒)

  2. svctm--平均每次设备I/O操作的服务时间(毫秒)

  3. %util--一秒中有百分之几的时间用于I/O操作

Criteria for evaluating disk IO performance:

Normal svctm should be less than the await value, and svctm is related to disk performance. CPU and memory load will also affect the svctm value. Excessive requests will also indirectly increase the svctm value.

  1. await值取决svctm和I/O队列长度以及I/O请求模式,

  2. 如果svctm的值与await很接近,表示几乎没有I/O等待,磁盘性能很好,

  3. 如果await的值远高于svctm的值,则表示I/O队列等待太长,系统上运行的应用程序将变慢,

  4. 此时可以通过更换更快的硬盘来解决问题。

%util--Important indicators for measuring disk I/O,

If %util is close to 100%, it means that the disk generates too many I/O requests, and the I/O system is already working at full capacity, and the disk may have a bottleneck.

The program can be optimized or by replacing a higher and faster disk.

2.5.1. Network performance evaluation

  1. (1)通过ping命令检测网络的连通性

  2. (2)通过netstat –i组合检测网络接口状况

  3. (3)通过netstat –r组合检测系统的路由表信息

  4. (4)通过sar –n组合显示系统的网络运行状态

Three Linux server performance tuning

1. Adjust the Linux kernel elevator algorithm for disk I/O

After selecting the file system, the algorithm can balance low-latency requirements, collect enough data, and effectively organize disk read and write requests.

2. Disable unnecessary daemons to save memory and CPU resources

  1. 许多守护进程或服务通常非必需,消耗宝贵内存和CPU时间。将服务器置于险地。

  2. 禁用可加快启动时间,释放内存。

  3.  
  4. 减少CPU要处理的进程数

Some Linux daemons that should be disabled, run automatically by default:

Serial number daemon description
1 Apmd advanced power management daemon
2 Nfslock for NFS file locking
3 Isdn ISDN Moderm support
4 Autofs automatically mounts the file system in the background (such as automatically mounts CD-ROM)
5 Sendmail mail transfer agent
6 Xfs X Window Font server

3. Related GUI

4. Clean up unnecessary modules or functions

Too many activated functions or modules in the server software package are actually not needed (such as many function modules in Apache). Disabling it will help increase the amount of system memory available, free up resources for those software that really need it, and let They run faster.

5. Disable the control panel

In Linux, there are many popular control panels, such as Cpanel, Plesk, Webmin, and phpMyAdmin. Disabling them releases about 120MB of memory, and memory usage drops by about 30-40%.

6. Improve Linux Exim server performance

Using the DNS cache daemon can reduce the bandwidth and CPU time required to resolve DNS records. DNS cache improves network performance by eliminating the need to look up DNS records from the root node every time.

Djbdns is a very powerful DNS server, it has DNS cache function, Djbdns is more secure and better performance than BIND DNS server, it can be downloaded directly through http://cr.yp.to/, or through the software package provided by Red Hat obtain.

7. Use AES256 to enhance gpg file encryption security

In order to improve the security of backup files or sensitive information, many Linux system administrators use gpg for encryption. When using gpg, it is best to specify gpg to use the AES256 encryption algorithm, and AES256 to use a 256-bit key. It is an open encryption algorithm. The National Security Agency (NSA) uses it to protect top-secret information.

8. Remote backup service security

Security is the most important factor in choosing a remote backup service. Most system administrators are afraid of two things: (hackers) can delete backup files and cannot restore the system from backup.

In order to ensure 100% security of the backup files, the backup service company provides a remote backup server that uses scp script or RSYNC to transfer data via SSH, so that no one can directly enter and access the remote system, therefore, no one can delete data from the backup service . When choosing a remote backup service provider, it is best to understand the robustness of its services from multiple aspects, if you can, you can test it yourself.

9. Update the default kernel parameter settings

In order to run enterprise applications smoothly and successfully, such as database servers, some default kernel parameter settings may need to be updated. For example, the 2.4.x series kernel message queue parameter msgmni has a default value (for example, shared memory, or shmmax in Red Hat systems The above default is only 33554432 bytes), which only allows limited concurrent connections to the database. The following provides some suggested values ​​for better operation of the database server (from the IBM DB2 support website):

kernel.shmmax=268435456 (32位)
kernel.shmmax=1073741824 (64位)
kernel.msgmni=1024
fs.file-max=8192
kernel.sem=”250 32000 32 1024″

10. Optimize TCP

Optimizing the TCP protocol helps to increase the network throughput. The larger the bandwidth used for cross-WAN communication and the longer the delay time, it is recommended to use a larger TCP Linux size to increase the data transmission rate. The TCP Linux size determines the sending host’s How much data can be sent to the receiving host when the data transmission confirmation is not received.

11. Choose the right file system

Use ext4 file system instead of ext3

● Ext4 is an enhanced version of the ext3 file system, which extends the storage limit

●With log function to ensure high level of data integrity (in the event of abnormal shutdown)

●It does not need to check the disk during abnormal shutdown and restart (this is a very time-consuming action)

●Faster writing speed, ext4 log optimizes hard disk head movement

12. Use the noatime file system mount option

Use the noatime option in the file system startup configuration file fstab. If external storage is used, this mount option can effectively improve performance.

13. Adjust the Linux file descriptor limit

Linux limits the number of file descriptors that can be opened by any process. The default limit is 1024 per process. These limits may prevent benchmark clients (such as httperf and apachebench) and the web server itself from achieving the best performance. Apache uses one for each connection. Processes are therefore not affected, but single-process Web servers, such as Zeus, use one file descriptor per connection, so they are easily affected by default restrictions.

The open file limit is a limit that can be adjusted with the ulimit command. The ulimit -aS command shows the current limit, and the ulimit -aH command shows the hard limit (you cannot increase the limit before adjusting the kernel parameters in /proc).

Linux third-party application performance skills

For third-party applications running on Linux, there are also many performance optimization techniques that can help you improve the performance of Linux servers and reduce operating costs.

14. Configure MySQL correctly

In order to allocate more memory to MySQL, you can set the MySQL cache size. If the MySQL server instance uses more memory, reduce the cache size. If MySQL is stagnant when requests increase, increase the MySQL cache.

15. Configure Apache correctly

Check how much memory is used by Apache, and then adjust the StartServers and MinSpareServers parameters to release more memory, which will help you save 30-40% of memory.

16. Analyze Linux server performance

The best way to improve system efficiency is to find the bottleneck that causes the overall speed to drop and solve it. Here are some basic techniques to find the key bottleneck of the system:

● When large applications such as OpenOffice and Firefox are running at the same time, the computer may start to slow down, and the chance of insufficient memory is higher.

● If the startup is really slow, it may take a long time to load the application for the first time, and it will run normally once it is started, otherwise the hard disk is probably too slow.

●The CPU load continues to be high and the memory is sufficient, but the CPU utilization is very low. You can use the CPU load analysis tool to monitor the load time.

17. Learn 5 Linux performance commands

Use a few commands to manage the performance of the Linux system. The five most commonly used Linux performance commands are listed below, including
top, vmstat, iostat, free, and sar , which help system administrators quickly solve performance problems.

(1)top

The task of the current kernel service also displays statistics on the status of many hosts. By default, it is automatically updated every 5 seconds.
Such as: current uptime, system load, number of processes and memory usage,

In addition, this command also displays the processes that use the most CPU time (including various information about each process, such as running users, executed commands, etc.).

(2)vmstat

The Vmstat command provides a snapshot of the current CPU, IO, process and memory usage. It is similar to the top command and automatically updates data, such as:

$ vmstat 10

(3)iostat

Iostat provides three reports: CPU utilization, device utilization, and network file system utilization. These three reports can be displayed independently using the -c, -d and -h parameters.

(4)free

Display main memory and swap space memory statistics. Specify the -t parameter to display the total memory, specify the -b parameter in bytes, and use -m to use megabytes as the unit. By default, kilobytes are the unit.

The Free command can also use the -s parameter plus a delay time (unit: second) to run continuously, such as:

$ free -s 5

(5) sar

Collect, view and record performance data. This command has a longer history than the previous commands. It can collect and display data for a longer period of time.

other

Here are some performance tips categorized as others:

18. Transfer log files to memory

When a machine is running, it is best to store the system log in the memory, and copy it to the hard disk when the system is turned off. When you run a laptop or mobile device with syslog enabled, ramlog can To help you increase the life of the system battery or the flash drive of the mobile device, one advantage of using ramlog is that you don’t have to worry about a daemon sending a message to syslog every 30 seconds. And the battery is bad.

19. Pack first, then write

A fixed-size space is divided into the memory to save the log file, which means that the hard disk of the laptop does not need to be kept running. It only runs when a daemon needs to write the log. Note that the memory space used by ramlog is fixed, otherwise The system memory will be used up quickly. If the notebook uses a solid state drive, 50-80MB of memory can be allocated to ramlog. Ramlog can reduce many write cycles and greatly increase the service life of the solid state drive.

20, general tuning skills

Use static content instead of dynamic content as much as possible . If you are generating weather forecasts or other data that must be updated every 1 hour, it is best to write a program that generates a static file every 1 hour instead of letting users run A CGI generates reports dynamically.

Choose the fastest and most suitable API for dynamic applications. CGI may be the easiest to program, but it will generate a process for each request. Usually, this is a costly and unnecessary process. FastCGI is the better choice , And Apache's mod_perl, can greatly improve the performance of the application.

Guess you like

Origin blog.csdn.net/grl18840839630/article/details/112306339