Linux - uptime command & detailed load average


uptime Used to display how long the system has been running in total and the average load of the system.

Instructions for use:

用法(Usage):
	uptime [options]
选项(Options):
	-p, --pretty   采用可读友好的格式输出系统已运行时长
	-h, --help     显示帮助信息
	-s, --since    输出系统启动时间
	-V, --version  显示版本信息

Case description:

[root@lechang ~]# uptime
15:38:38 up 116 days,  1:36,  2 users,  load average: 4.52, 3.96, 2.47
  • 15:38:38: current system time
  • up 116 days, 1:36: The system has been running for a long time (116 days, 1 hour and 36 minutes)
  • 2 users: The number of currently logged-in users
  • load average: 4.52, 3.96, 2.47: load average, past 1 minute, 5 minutes, 10 minutes

what is load average

The average load refers to the average number of processes that the system is in a runnable and uninterruptible state per unit time , that is, the average number of active processes, which is not directly related to the CPU usage.

  • A runnable process refers a process that is using the CPU or waiting for the CPU , that is, a process in the R state (Running or Runnable) that we often see with the ps command.
  • The process in the uninterruptible state is the process in the key process of the kernel state, and these processes are uninterruptible. For example, the most common is to wait for the I/O response of the hardware device , which is what we see in the ps command. State (Uninterruptible Sleep, also known as Disk Sleep) process.

Ideally, the average load is equal to the number of CPUs. When the average load is higher than 70% of the number of CPUs, it is time to analyze and troubleshoot the problem of high load.

# 查看CPU个数
grep 'model name' /proc/cpuinfo | wc -l

Understand the current average load of the system

# watch -d:表示高亮显示变化的区域
watch -d uptime
  • The three values ​​of 1 minute, 5 minutes, and 10 minutes are basically the same or have little difference, which means that the system load is very stable.
  • The value of 1 minute is much smaller than the value of 15 minutes, indicating that the load of the system has decreased in the last 1 minute, but there was a large load in the past 15 minutes.
  • The value of 1 minute is much greater than the value of 15 minutes, indicating that the load in the last 1 minute has increased. This increase may only be temporary, or it may continue to increase, so continuous observation is required. Once the 1-minute average load is close to or exceeds the number of CPUs, it means that the system is overloaded. At this time, it is necessary to analyze and see where it is caused, and find a way to optimize it.

Troubleshooting ideas for high average load problems

CPU-intensive process case:

  1. Check the CPU usage: Observe whether the %usr of a certain CPU is high, but the iowait should be low
# 显示所有CPU的指标,并在间隔5秒输出一组数据
mpstat -P ALL 5 1
  1. Query which thread caused it: Observe which process %cpu is high, but %wait is low, it is very likely that this process caused the high cpu
# 间隔5秒后输出一组数据,-u表示CPU指标
pidstat -u 5 1

IO-intensive process case:

  1. Check the CPU usage: Check whether the %iowait of a certain CPU is high, and %usr is also high
# 显示所有CPU的指标,并在间隔5秒输出一组数据
mpstat -P ALL 5 1
  1. Query which thread causes: observe which process has higher %wait and higher %CPU
# 间隔5秒后输出一组数据,-u表示CPU指标
pidstat -u 5 1

A large number of process cases:

  1. Query which thread causes: observe whether there are many processes with high %wait
# 间隔5秒后输出一组数据,-u表示CPU指标
pidstat -u 5 1

Guess you like

Origin blog.csdn.net/weixin_44988085/article/details/128694110