[Linux Basics Introduction] (14) Linux Process Management

1 Introduction

content

  • Master some of the tools provided by Linux to view and control the process. Mastering these tools allows us to view relevant indicators in time when some processes are abnormal, so as to solve the problem.

Knowledge points

  • View the running status of the process
  • End of process control
  • The execution order of the process

2 View of the process

It is inevitable that we will encounter some process abnormalities, so Linux provides us with some tools to view the status information of the process. Through top to view the status of the process in real time, as well as some information of the system (such as CPU, memory information, etc.), we can also statically view the current process information through ps , and we can also use pstree to view the tree of the currently active process structure.

2.1 Use of top tool

The top tool is one of our commonly used viewing tools, which can view the changes of some key information of our system in real time

top

Insert picture description here
top is a program executed in the foreground, so after execution, it enters such an interactive interface. In the interactive interface, we can operate and filter through some instructions. Before that, let's first understand what information is displayed.

We see the first row displayed by top

content Explanation
top Indicates the name of the current program
11:29:56 Indicates the current system time
up 2:32 Indicates how long the machine has been started (including days)
1 user Indicates that there is only one user in the current system
load average: 0.28,0.14,0.05 Corresponding to the average load of the cpu in 1, 5, and 15 minutes

The explanation of load average in wikipedia is that the system load is a measure of the amount of work that a computer system is doing, which is a measure of the current CPU workload, specifically, it refers to the average length of the run queue, which is waiting A calculated value related to the average number of processes of the CPU.

How should we view this load average data?

Assuming that our system is single-CPU and single-core, compare it to a one-way bridge, and compare the CPU task to a car.

  • When load = 0, it means that there is no car on this bridge, and the cpu has no task;
  • When load <1 means that there are not many cars on the bridge, everything is still very smooth, the CPU does not have many tasks, and the resources are still sufficient;
  • When load = 1, it means that the bridge has been filled with cars, there is no gap, the cpu has been working at full capacity, all resources have been used up, of course, it is still within the scope of ability, but A bit slow
  • When load> 1, it means that not only the bridge is already occupied by cars, but also the outside of the bridge is occupied. The cpu is working at full capacity and the system resources are used up, but there are still a large number of processes requesting ,waiting. If this value is greater than 2 or greater than 3, it means that the process request exceeds the CPU's work capacity by 2 to 3 times. If this value is greater than 5, it means that the system has been overloaded.

This is the case of a single CPU with a single core, and in real life we ​​need to divide the value obtained by our number of cores. We can use the following commands to view the number of CPUs and cores

#查看物理CPU的个数
cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l

#每个cpu的核心数
cat /proc/cpuinfo |grep "physical id"|grep "0"|wc -l

The critical value of single-core load is 1, but in real life, more experienced operation and maintenance or system administrators will set the critical value to 0.7. The exponents here are all divided by the number of cores

  • If load <0.7, it will not pay attention to him;
  • If it is 0.7< load <1, we need to pay attention to it a little bit. Although it can be handled, this value is not far from the critical point;
  • If load = 1, we need to be vigilant, because there are no more resources at this time, and we are already doing our best;
  • If load> 5, the system is almost dead, you need to work overtime to solve the problem at this time

Usually we will first look at the 15-minute value to see this general trend, and then look at the 5-minute value comparison to see if there is a downward trend.

Let's look at the second row of top data, basically the second row is a statistics of the process

content Explanation
Tasks: 296 total Total number of processes
2 running 2 running processes
222 sleeping 222 sleeping processes
1 stopped 1 stopped process
1 zombie 1 zombie process

The third row of top data is basically a statistics of CPU usage

content Explanation
Cpu(s): 2.0%us Percentage of CPU occupied by user space processes
0.8% sy Percentage of CPU occupied by kernel space operation
0.0% ni Percentage of CPU occupied by processes that have changed their priority in the user process space
97.2%id Percentage of free CPU
0.0% wa Percentage of CPU time waiting for input and output
0.0%hi Hard interrupt (Hardware IRQ) occupies the percentage of CPU
0.0% yes Percentage of CPU occupied by software interrupt (Software IRQ)
0.0%st (Steal time) is the percentage of time the virtual CPU waits for the actual CPU in virtual services such as hypervisor

CPU utilization is the statistics of CPU usage in a period of time. Through this indicator, you can see how CPU is occupied in a certain period of time. Load Average is the load of CPU, and the information it contains is not CPU usage. Rate status, but the statistical information of the number of processes that the CPU is processing and waiting for the CPU within a period of time. These two indicators are not the same.

top The fourth row of data is memory usage statistics

content Explanation
2017496 total Total physical memory
1329432 used Total physical memory used
101472 free Total amount of free memory
586592 buffers Amount of memory used as kernel cache

Note: The maximum physical memory available in the system is not the single value of free, but the sum and bold style of cached in free + buffers + swap

top The fifth row of data is the statistics of the usage of the exchange area

content Explanation
total Total exchange area
used Total exchange area used
free Total amount of free swap area
cached The total amount of buffered swap area, the contents of the memory are swapped out to the swap area, and then back to the memory, but the used swap area has not been overwritten

The following is a situation of the process

Column name Explanation
PID Process id
USER The user who owns the process
PR The priority value of the process execution priority
NI Nice value of the process
VIRT The total amount of virtual memory used by this process task
RES The number of physical memory used by the process, also known as the number of resident memory
SHR The size of the shared memory of the process
S The state of the process: S=sleep R=running Z=zombie
%CPU CPU utilization of the process
%MEM Utilization of the process memory
TIME+ The total time the process is active
COMMAND The name of the process running

The NICE value is called static priority, which is a priority value in user space, and its value range is -20 to 19. The smaller the value, the higher the "priority" of the process, and the larger the value, the lower the "priority". The nice value is -20 to 19, middle -20 has the highest priority, 0 is the default value, and 19 has the lowest priority

The PR value indicates that the Priority value is called dynamic priority, which is the actual priority value of the process in the kernel. The value range of the process priority is defined by a macro whose name is MAX_PRIO and its value is 140. Linux actually implements 140 priority ranges, ranging from 0-139. The smaller the value, the higher the priority. Among them, 0-99 are real-time process values, and 100-139 are for users.

Among them, the value of 100 to 139 in PR has a corresponding PR = 20 + (-20 to +19), where -20 to +19 is a nice value, so although both are priority, there are thousands The relationship between them, but their values ​​and their scope of action are not the same

We have said above that top is a foreground program, so it can be interactive

Interactive command Explanation
q exit the program
I Toggle display of average load and start-up time information
P 根据 CPU 使用百分比大小进行排序
M 根据驻留内存大小进行排序
i 忽略闲置和僵死的进程,这是一个开关式命令
k 终止一个进程,系统提示输入 PID 及发送的信号值。一般终止进程用 15 信号,不能正常结束则使用 9 信号。安全模式下该命令被屏蔽。

好好的利用 top 能够很有效的帮助我们观察到系统的瓶颈所在,或者是系统的问题所在。

2.2 ps 工具的使用

ps 是最常用的查看进程的工具之一,我们通过这样的一个命令来了解一下,他能给我带来哪些信息

ps aux

Insert picture description here

ps axjf

Insert picture description here
我们来总体了解下这些信息又代表着什么

内容 解释
F 进程的标志(process flags),当 flags 值为 1 则表示此子程序只是 fork 但没有执行 exec,
为 4 表示此程序使用超级管理员 root 权限
USER 进程的拥有用户
PID 进程的 ID
PPID 其父进程的 PID
SID session 的 ID
TPGID 前台进程组的 ID
%CPU 进程占用的 CPU 百分比
%MEM 占用内存的百分比
NI 进程的 NICE 值
VSZ 进程使用虚拟内存大小
RSS 驻留内存中页的大小
TTY 终端 ID
S or STAT 进程状态
WCHAN 正在等待的进程资源
START 启动进程的时间
TIME 进程消耗 CPU 的时间
COMMAND 命令的名称和参数

其中的 D 是不能被中断睡眠的状态,处在这种状态的进程不接受外来的任何 signal,所以无法使用 kill 命令杀掉处于 D 状态的进程,无论是 killkill -9 还是 kill -15,一般处于这种状态可能是进程 I/O 的时候出问题了。

ps 工具有许多的参数,下面给大家解释部分常用的参数

使用 -l 参数可以显示自己这次登录的 bash 相关的进程信息罗列出来

ps -l

Insert picture description here
Relatively speaking, we use the following command more often, it will list all the process information

ps aux

If it is to find one of the processes, we can also use it with grep and regular expressions

ps aux | grep zsh

Insert picture description here
In addition, when we can view it, it will be displayed as a tree with part of the process.

ps axjf

Insert picture description hereYou can also customize the required parameter display

# 逗号后面不要留空格
ps -afxo user,ppid,pid,pgid,command

Insert picture description here

2.3 Use of pstree tool

Through pstree, you can see the same number of processes directly, and the most important thing is that we can see the correlation between all processes.

pstree

Insert picture description here

pstree -up

#参数选择:
#-A  :各程序树之间以 ASCII 字元來連接;
#-p  :同时列出每个 process 的 PID;
#-u  :同时列出每个 process 的所屬账户名称。

Insert picture description here

3 Process management

3.1 Mastering the kill command

When a process ends or is about to end abnormally, it will return to its parent process or receive a SIGHUP signal to terminate the process or perform other operations. This SIGHUP signal can not only be sent by the system, we can use kill to Send this signal to end or restart the operation process, etc.

In the last lesson, we used the kill command to manage some of our jobs. In this lesson, we will try to use kill to operate some processes that do not belong to the job category, and directly attack the pid

#首先使用图形界面打开了 gedit、vim,用 ps 可以查看到
ps aux

#使用9这个信号强制结束 gedit 进程
kill -9 4561

#我们再查找这个进程的时候就找不到了
ps aux | grep gedit

Insert picture description here

3.2 The execution sequence of the process

When we use the ps command, we can see that most of the processes are in a dormant state. If these processes are awakened, then who is the first to enjoy the services of the CPU, and what order should the subsequent processes be? ? How should the process scheduling queue be arranged?

Of course, the priority value of the process is used to determine the priority of the process scheduling, and the priority value is the PR and nice mentioned above to control and reflect

The value of nice can be modified through the nice command, and it should be noted that the range of nice value can be adjusted from -20 to 19, where root has the supreme power, which can adjust its own process or other users’ Programs, and all values ​​can be used, but ordinary users can only modulate their own processes, and the range of their use can only be 0 ~ 19, because the system sets a limit in order to prevent ordinary users from preempting system resources

#打开一个程序放在后台,或者用图形界面打开
nice -n -5 vim &

#用 ps 查看其优先级
ps -afxo user,ppid,pid,stat,pri,ni,time,command | grep vim

We can also use renice to modify the priority of an existing process

renice -5 pid

Insert picture description here

4 summary

Learned the process viewing commands ps, pstree, top, and the meaning of the information obtained when using these commands, so as to obtain the information we need, and at the same time we learned the process management commands kill, nice, renice

Guess you like

Origin blog.csdn.net/happyjacob/article/details/107051107