Learning Linux process management

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/qq_42415326/article/details/91126630

 Linux provides some tools to view the process status information for us.

We can pass some of the information in real time to view the status of the process, and the system (such as CPU, memory, information, etc.); top 

We can also  still view the current process information; ps

We also can use to view the current active process tree structure. pstree 

Use 1 top tool

top Our tools are commonly used in a viewing tool, real-time view of the change in some of the key messages of our system:

top

Laboratory building

top is a program executed in the foreground, so the execution after entering into such an interface, it is because we can interface to obtain real-time information systems and processes. In the interface, we can operate and screened by some instructions. Until we start to understand what information is displayed.

 

We see the first row of top display

content Explanation
top It represents the name of the current program
11:05:18 It indicates the current time system
up 8 days,17:12 Indicates that the machine is now on how long
1 user It indicates that the current system has only one user
load average: 0.29,0.20,0.25 Respectively corresponding to the average load of 5, 15 minutes cpu

Wikipedia explained in load average is the system load is a measure of the amount of work that a computer system is doing is a measure of the current workload of the CPU, specifically refers is a running average queue length, i.e. waiting a calculation of the average number of processes related to the CPU.

How do we look at the load average data it?

We assume that the system is a single CPU, single core, likens it to be a one-way bridge, the CPU tasks compared to cars.

  • load = 0 when the bridge does not mean that the car, cpu without any task;
  • load <1 means that the bridge when the car is not much, everything is still very smooth, cpu's task is not much, is still adequate resources;
  • When load = 1 means that the bridge has been filled up to the car, with no gaps, cpu already working at full speed, all the resources are used up, of course, but fortunately, it is still within the sphere of competence, but just a little slow;
  • load> 1 when it means more than just the bridge has been filled the car, and even outside the bridge have been filled up, cpu already working at full speed, run out of system resources, but there are still a large number of process requests ,waiting. If this value is greater than 2, greater than 3, the process request indicates more than 2 to 3 times the CPU to work. And if the value is> 5 illustrates the operation of the system has been overloaded.

This is a single case of a single CPU core, but in real life we ​​need to get this value by dividing the number of our nuclear point of view. We can see the CPU by the following command number and the number of cores

#查看物理CPU的个数
#cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l

#每个cpu的核心数
cat /proc/cpuinfo |grep "physical id"|grep "0"|wc -l

Through the above index we can see that the critical load value of 1, but in real life, more experienced operation and maintenance or system administrator will set the critical value of 0.7. Here are the index divided by the number of cores in the future, do not confuse

  • If the load <0.7 and not to pay attention to him;
  • If 0.7 <load <1 when we need to look a little, though you can also deal with but this value is not far from critical;
  • If the load = 1 when we need to be vigilant, because this time has no more resources, and is already fully stretched;
  • If load> 5 when the system is already dying, this time you need to work overtime to solve the problem

Usually we will look at the value of 15-minute look at the general trend, then see if there is a downward trend contrast value of 5 minutes of view.

View busybox code may know, is the number of data every five seconds to check a dynamic process, and then calculate the value, and then load from the  /proc/loadavgreading of. The value of this load is how to calculate it, which is the source of the calculated load

#define FSHIFT      11          /* nr of bits of precision */
#define FIXED_1     (1<<FSHIFT) /* 1.0 as fixed-point(定点) */
#define LOAD_FREQ   (5*HZ)      /* 5 sec intervals,每隔5秒计算一次平均负载值 */
#define CALC_LOAD(load, exp, n)     \
         load *= exp;               \
         load += n*(FIXED_1 - exp); \
         load >>= FSHIFT;

unsigned long avenrun[3];

EXPORT_SYMBOL(avenrun);

/*
* calc_load - given tick count, update the avenrun load estimates.
* This is called while holding a write_lock on xtime_lock.
*/
static inline void calc_load(unsigned long ticks)
{
        unsigned long active_tasks; /* fixed-point */
        static int count = LOAD_FREQ;
        count -= ticks;
        if (count < 0) {
                count += LOAD_FREQ;
                active_tasks = count_active_tasks();
                CALC_LOAD(avenrun[0], EXP_1, active_tasks);
                CALC_LOAD(avenrun[1], EXP_5, active_tasks);
                CALC_LOAD(avenrun[2], EXP_15, active_tasks);
        }
}

Latter part of the code corresponding to its calculated

 

We return to the topic, the second line of data, the top of the second line is basically a case of statistical process

content Explanation
Tasks: 26 total Total number of processes
1 running A number of running processes
25 sleeping The number of 25 sleeping processes
0 stopped It did not stop the number of processes
0 zombie No zombie number of processes

The third line data, top of the line is basically a statistical usage of the CPU

content Explanation
Cpu(s): 1.0%us User-space process CPU percentage
1.0% its Kernel space to run CPU-percentage
0.0% ni Priority processes within the user process space occupied by the percentage of CPU changed
97.9%id CPU idle percentage
0.0% wa Waiting for input and output of the percentage of CPU time
0.1%hi The percentage hardware interrupt (Hardware IRQ) CPU-intensive
0.0% and The percentage of soft interrupt (Software IRQ) CPU-intensive
0.0%st (Steal time) is a hypervisor and other virtualization services, virtual CPU time to wait for the actual CPU percentage

CPU utilization statistics for CPU usage is a period of time, through this indicator can be seen in the case of a certain period of time the CPU is occupied, and Load Average is the CPU Load, the use of the information it contains is not a CPU the number of processes statistics of the situation, but in a period of time waiting for the CPU CPU is processing and treatment, these two indicators are not the same.

The fourth line data, top of the line is basically a statistical usage of the memory:

content Explanation
8176740 total Total physical memory
8032104 used Amount of physical memory used
144636 free The total amount of free memory
313088 buffers The amount of memory used as a kernel cache

note

This single system physical memory available maximum value is not free, but free + buffers + swap in and cached

The fifth line data, top of the line is basically a statistical usage of the swap

content Explanation
total The total amount of swap
used The total amount used swap
free The amount of free swap
cached Of the total area of ​​the buffer exchange, the contents of memory to be swapped out to the swap, but later was changed into memory, but the used swap has not been covered

Here again is a case of the process

Column Name Explanation
PID Process id
USER The process belongs to the user
PR 该进程执行的优先级 priority 值
NI 该进程的 nice 值
VIRT 该进程任务所使用的虚拟内存的总数
RES 该进程所使用的物理内存数,也称之为驻留内存数
SHR 该进程共享内存的大小
S 该进程进程的状态: S=sleep R=running Z=zombie
%CPU 该进程CPU的利用率
%MEM 该进程内存的利用率
TIME+ 该进程活跃的总时间
COMMAND 该进程运行的名字

注意

NICE 值叫做静态优先级,是用户空间的一个优先级值,其取值范围是-20至19。这个值越小,表示进程”优先级”越高,而值越大“优先级”越低。nice值中的 -20 到 19,中 -20 优先级最高, 0 是默认的值,而 19 优先级最低

PR 值表示 Priority 值叫动态优先级,是进程在内核中实际的优先级值,进程优先级的取值范围是通过一个宏定义的,这个宏的名称是 MAX_PRIO,它的值为 140。Linux 实际上实现了 140 个优先级范围,取值范围是从 0-139,这个值越小,优先级越高。而这其中的 0 - 99 是实时进程的值,而 100 - 139 是给用户的。

其中 PR 中的 100 to 139 值部分有这么一个对应 PR = 20 + (-20 to +19),这里的 -20 to +19 便是nice值,所以说两个虽然都是优先级,而且有千丝万缕的关系,但是他们的值,他们的作用范围并不相同

** VIRT **任务所使用的虚拟内存的总数,其中包含所有的代码,数据,共享库和被换出 swap空间的页面等所占据空间的总数

在上文我们曾经说过 top 是一个前台程序,所以是一个可以交互的

常用交互命令 解释
q 退出程序
I 切换显示平均负载和启动时间的信息
P 根据CPU使用百分比大小进行排序
M 根据驻留内存大小进行排序
i 忽略闲置和僵死的进程,这是一个开关式命令
k 终止一个进程,系统提示输入 PID 及发送的信号值。一般终止进程用 15 信号,不能正常结束则使用 9 信号。安全模式下该命令被屏蔽。

好好的利用 top 能够很有效的帮助我们观察到系统的瓶颈所在,或者是系统的问题所在。

2 ps 工具的使用

ps 也是我们最常用的查看进程的工具之一。ps 工具有许多的参数,下面给大家解释部分常用的参数

 

使用 -l 参数可以显示自己这次登陆的 bash 相关的进程信息罗列出来

ps -l

Laboratory building

相对来说我们更加常用下面这个命令,他将会罗列出所有的进程信息

ps aux

Laboratory building

我们来总体了解下会出现哪些信息给我们,这些信息又代表着什么

内容 解释
F 进程的标志(process flags),当 flags 值为 1 则表示此子程序只是 fork 但没有执行 exec,为 4 表示此程序使用超级管理员 root 权限
USER 进程的拥有用户
PID 进程的 ID
PPID 其父进程的 PID
SID session 的 ID
TPGID 前台进程组的 ID
%CPU 进程占用的 CPU 百分比
%MEM 占用内存的百分比
NI 进程的 NICE 值
VSZ 进程使用虚拟内存大小
RSS 驻留内存中页的大小
TTY 终端 ID
S or STAT 进程状态
WCHAN 正在等待的进程资源
START 启动进程的时间
TIME 进程消耗CPU的时间
COMMAND 命令的名称和参数

TPGID栏写着-1的都是没有控制终端的进程,也就是守护进程

STAT表示进程的状态,而进程的状态有很多,如下表所示

状态 解释
R Running.运行中
S Interruptible Sleep.等待调用
D Uninterruptible Sleep.不可中断睡眠
T Stoped.暂停或者跟踪状态
X Dead.即将被撤销
Z Zombie.僵尸进程
W Paging.内存交换
N 优先级低的进程
< 优先级高的进程
s 进程的领导者
L 锁定状态
l 多线程状态
+ 前台进程

其中的 D 是不能被中断睡眠的状态,处在这种状态的进程不接受外来的任何 signal,所以无法使用 kill 命令杀掉处于D状态的进程,无论是 killkill -9 还是 kill -15。

一般处于这种状态可能是进程 I/O 的时候出问题了。

若是查找其中的某个进程的话,我们还可以配合着 grep 和正则表达式一起使用

ps aux | grep zsh

Laboratory building

此外我们还可以查看时,将连同部分的进程呈树状显示出来

ps axjf

Laboratory building

当然如果你觉得使用这样的此时没有把你想要的信息放在一起,我们也可以是用这样的命令,来自定义我们所需要的参数显示

ps -afxo user,ppid,pid,pgid,command

Laboratory building

3 pstree 工具的使用

通过 pstree 可以很直接的看到相同的进程数量,最主要的还是我们可以看到所有进程之间的相关性。

pstree

Laboratory building

pstree -up

#参数选择:
#-A  :各程序树之间以 ASCII 字元來连接;
#-p  :同时列出每个 process 的 PID;
#-u  :同时列出每个 process 的所属账户名称。

Laboratory building

 

二、进程的管理 

1 kill 命令的掌握

当一个进程结束的时候或者要异常结束的时候,会向其父进程返回一个或者接收一个 SIGHUP 信号而做出的结束进程或者其他的操作,这个 SIGHUP 信号不仅可以由系统发送,我们可以使用 kill 来发送这个信号来操作进程的结束或者重启等等。

之前我们使用 kill 命令来管理我们的一些 job,现在我们将尝试用 kill 来操作下一些不属于 job 范畴的进程,直接对 pid 下手

#首先我们使用图形界面打开了 gedit、gvim,用 ps 可以查看到
ps aux

#使用9这个信号强制结束 gedit 进程
kill -9 1608

#我们再查找这个进程的时候就找不到了
ps aux | grep gedit 

 

Laboratory building

3.2 进程的执行顺序

我们在使用 ps 命令的时候可以看到大部分的进程都是处于休眠的状态,如果这些进程都被唤醒,那么该谁最先享受 CPU 的服务,后面的进程又该是一个什么样的顺序呢?进程调度的队列又该如何去排列呢?

当然就是靠该进程的优先级值来判定进程调度的优先级,而优先级的值就是上文所提到的 PR 与 nice 来控制与体现了

The nice value can be modified by us are nice command, but should be noted that nice value can be adjusted from -20 to 19, where the root has the supreme authority, either adjust their processes can also adjust other users program, and all the values ​​can be used, but ordinary users can only own modulation process, and the scope of its use is only 0 to 19, because of a restriction system in order to avoid general users to seize system resources provided

#这个实验在环境中无法做,因为权限不够,可以自己在本地尝试

#打开一个程序放在后台,或者用图形界面打开
nice -n -5 vim &

#用 ps 查看其优先级
ps -afxo user,ppid,pid,stat,pri,ni,time,command | grep vim

We can also use renice to change the priority of processes already exist:

renice -5 pid

 

Guess you like

Origin blog.csdn.net/qq_42415326/article/details/91126630