Linux provides some tools to view the process status information for us.
We can pass some of the information in real time to view the status of the process, and the system (such as CPU, memory, information, etc.);
top
We can also still view the current process information;
ps
We also can use to view the current active process tree structure.
pstree
Use 1 top tool
top
Our tools are commonly used in a viewing tool, real-time view of the change in some of the key messages of our system:
top
top is a program executed in the foreground, so the execution after entering into such an interface, it is because we can interface to obtain real-time information systems and processes. In the interface, we can operate and screened by some instructions. Until we start to understand what information is displayed.
We see the first row of top display
content | Explanation |
---|---|
top | It represents the name of the current program |
11:05:18 | It indicates the current time system |
up 8 days,17:12 | Indicates that the machine is now on how long |
1 user | It indicates that the current system has only one user |
load average: 0.29,0.20,0.25 | Respectively corresponding to the average load of 5, 15 minutes cpu |
Wikipedia explained in load average is the system load is a measure of the amount of work that a computer system is doing is a measure of the current workload of the CPU, specifically refers is a running average queue length, i.e. waiting a calculation of the average number of processes related to the CPU.
How do we look at the load average data it?
We assume that the system is a single CPU, single core, likens it to be a one-way bridge, the CPU tasks compared to cars.
- load = 0 when the bridge does not mean that the car, cpu without any task;
- load <1 means that the bridge when the car is not much, everything is still very smooth, cpu's task is not much, is still adequate resources;
- When load = 1 means that the bridge has been filled up to the car, with no gaps, cpu already working at full speed, all the resources are used up, of course, but fortunately, it is still within the sphere of competence, but just a little slow;
- load> 1 when it means more than just the bridge has been filled the car, and even outside the bridge have been filled up, cpu already working at full speed, run out of system resources, but there are still a large number of process requests ,waiting. If this value is greater than 2, greater than 3, the process request indicates more than 2 to 3 times the CPU to work. And if the value is> 5 illustrates the operation of the system has been overloaded.
This is a single case of a single CPU core, but in real life we need to get this value by dividing the number of our nuclear point of view. We can see the CPU by the following command number and the number of cores
#查看物理CPU的个数
#cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l
#每个cpu的核心数
cat /proc/cpuinfo |grep "physical id"|grep "0"|wc -l
Through the above index we can see that the critical load value of 1, but in real life, more experienced operation and maintenance or system administrator will set the critical value of 0.7. Here are the index divided by the number of cores in the future, do not confuse
- If the load <0.7 and not to pay attention to him;
- If 0.7 <load <1 when we need to look a little, though you can also deal with but this value is not far from critical;
- If the load = 1 when we need to be vigilant, because this time has no more resources, and is already fully stretched;
- If load> 5 when the system is already dying, this time you need to work overtime to solve the problem
Usually we will look at the value of 15-minute look at the general trend, then see if there is a downward trend contrast value of 5 minutes of view.
View busybox code may know, is the number of data every five seconds to check a dynamic process, and then calculate the value, and then load from the /proc/loadavg
reading of. The value of this load is how to calculate it, which is the source of the calculated load
#define FSHIFT 11 /* nr of bits of precision */
#define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point(定点) */
#define LOAD_FREQ (5*HZ) /* 5 sec intervals,每隔5秒计算一次平均负载值 */
#define CALC_LOAD(load, exp, n) \
load *= exp; \
load += n*(FIXED_1 - exp); \
load >>= FSHIFT;
unsigned long avenrun[3];
EXPORT_SYMBOL(avenrun);
/*
* calc_load - given tick count, update the avenrun load estimates.
* This is called while holding a write_lock on xtime_lock.
*/
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
static int count = LOAD_FREQ;
count -= ticks;
if (count < 0) {
count += LOAD_FREQ;
active_tasks = count_active_tasks();
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
}
}
Latter part of the code corresponding to its calculated
We return to the topic, the second line of data, the top of the second line is basically a case of statistical process
content | Explanation |
---|---|
Tasks: 26 total | Total number of processes |
1 running | A number of running processes |
25 sleeping | The number of 25 sleeping processes |
0 stopped | It did not stop the number of processes |
0 zombie | No zombie number of processes |
The third line data, top of the line is basically a statistical usage of the CPU
content | Explanation |
---|---|
Cpu(s): 1.0%us | User-space process CPU percentage |
1.0% its | Kernel space to run CPU-percentage |
0.0% ni | Priority processes within the user process space occupied by the percentage of CPU changed |
97.9%id | CPU idle percentage |
0.0% wa | Waiting for input and output of the percentage of CPU time |
0.1%hi | The percentage hardware interrupt (Hardware IRQ) CPU-intensive |
0.0% and | The percentage of soft interrupt (Software IRQ) CPU-intensive |
0.0%st | (Steal time) is a hypervisor and other virtualization services, virtual CPU time to wait for the actual CPU percentage |
CPU utilization statistics for CPU usage is a period of time, through this indicator can be seen in the case of a certain period of time the CPU is occupied, and Load Average is the CPU Load, the use of the information it contains is not a CPU the number of processes statistics of the situation, but in a period of time waiting for the CPU CPU is processing and treatment, these two indicators are not the same.
The fourth line data, top of the line is basically a statistical usage of the memory:
content | Explanation |
---|---|
8176740 total | Total physical memory |
8032104 used | Amount of physical memory used |
144636 free | The total amount of free memory |
313088 buffers | The amount of memory used as a kernel cache |
note
This single system physical memory available maximum value is not free, but free + buffers + swap in and cached
The fifth line data, top of the line is basically a statistical usage of the swap
content | Explanation |
---|---|
total | The total amount of swap |
used | The total amount used swap |
free | The amount of free swap |
cached | Of the total area of the buffer exchange, the contents of memory to be swapped out to the swap, but later was changed into memory, but the used swap has not been covered |
Here again is a case of the process
Column Name | Explanation |
---|---|
PID | Process id |
USER | The process belongs to the user |
PR | 该进程执行的优先级 priority 值 |
NI | 该进程的 nice 值 |
VIRT | 该进程任务所使用的虚拟内存的总数 |
RES | 该进程所使用的物理内存数,也称之为驻留内存数 |
SHR | 该进程共享内存的大小 |
S | 该进程进程的状态: S=sleep R=running Z=zombie |
%CPU | 该进程CPU的利用率 |
%MEM | 该进程内存的利用率 |
TIME+ | 该进程活跃的总时间 |
COMMAND | 该进程运行的名字 |
注意
NICE 值叫做静态优先级,是用户空间的一个优先级值,其取值范围是-20至19。这个值越小,表示进程”优先级”越高,而值越大“优先级”越低。nice值中的 -20 到 19,中 -20 优先级最高, 0 是默认的值,而 19 优先级最低
PR 值表示 Priority 值叫动态优先级,是进程在内核中实际的优先级值,进程优先级的取值范围是通过一个宏定义的,这个宏的名称是 MAX_PRIO,它的值为 140。Linux 实际上实现了 140 个优先级范围,取值范围是从 0-139,这个值越小,优先级越高。而这其中的 0 - 99 是实时进程的值,而 100 - 139 是给用户的。
其中 PR 中的 100 to 139 值部分有这么一个对应
PR = 20 + (-20 to +19)
,这里的 -20 to +19 便是nice值,所以说两个虽然都是优先级,而且有千丝万缕的关系,但是他们的值,他们的作用范围并不相同
** VIRT **任务所使用的虚拟内存的总数,其中包含所有的代码,数据,共享库和被换出 swap空间的页面等所占据空间的总数
在上文我们曾经说过 top 是一个前台程序,所以是一个可以交互的
常用交互命令 | 解释 |
---|---|
q | 退出程序 |
I | 切换显示平均负载和启动时间的信息 |
P | 根据CPU使用百分比大小进行排序 |
M | 根据驻留内存大小进行排序 |
i | 忽略闲置和僵死的进程,这是一个开关式命令 |
k | 终止一个进程,系统提示输入 PID 及发送的信号值。一般终止进程用 15 信号,不能正常结束则使用 9 信号。安全模式下该命令被屏蔽。 |
好好的利用 top 能够很有效的帮助我们观察到系统的瓶颈所在,或者是系统的问题所在。
2 ps 工具的使用
ps 也是我们最常用的查看进程的工具之一。ps 工具有许多的参数,下面给大家解释部分常用的参数
使用 -l
参数可以显示自己这次登陆的 bash 相关的进程信息罗列出来
ps -l
相对来说我们更加常用下面这个命令,他将会罗列出所有的进程信息
ps aux
我们来总体了解下会出现哪些信息给我们,这些信息又代表着什么
内容 | 解释 |
---|---|
F | 进程的标志(process flags),当 flags 值为 1 则表示此子程序只是 fork 但没有执行 exec,为 4 表示此程序使用超级管理员 root 权限 |
USER | 进程的拥有用户 |
PID | 进程的 ID |
PPID | 其父进程的 PID |
SID | session 的 ID |
TPGID | 前台进程组的 ID |
%CPU | 进程占用的 CPU 百分比 |
%MEM | 占用内存的百分比 |
NI | 进程的 NICE 值 |
VSZ | 进程使用虚拟内存大小 |
RSS | 驻留内存中页的大小 |
TTY | 终端 ID |
S or STAT | 进程状态 |
WCHAN | 正在等待的进程资源 |
START | 启动进程的时间 |
TIME | 进程消耗CPU的时间 |
COMMAND | 命令的名称和参数 |
TPGID栏写着-1的都是没有控制终端的进程,也就是守护进程
STAT表示进程的状态,而进程的状态有很多,如下表所示
状态 | 解释 |
---|---|
R | Running.运行中 |
S | Interruptible Sleep.等待调用 |
D | Uninterruptible Sleep.不可中断睡眠 |
T | Stoped.暂停或者跟踪状态 |
X | Dead.即将被撤销 |
Z | Zombie.僵尸进程 |
W | Paging.内存交换 |
N | 优先级低的进程 |
< | 优先级高的进程 |
s | 进程的领导者 |
L | 锁定状态 |
l | 多线程状态 |
+ | 前台进程 |
其中的 D 是不能被中断睡眠的状态,处在这种状态的进程不接受外来的任何 signal,所以无法使用 kill 命令杀掉处于D状态的进程,无论是
kill
,kill -9
还是kill -15。
一般处于这种状态可能是进程 I/O 的时候出问题了。
若是查找其中的某个进程的话,我们还可以配合着 grep 和正则表达式一起使用
ps aux | grep zsh
此外我们还可以查看时,将连同部分的进程呈树状显示出来
ps axjf
当然如果你觉得使用这样的此时没有把你想要的信息放在一起,我们也可以是用这样的命令,来自定义我们所需要的参数显示
ps -afxo user,ppid,pid,pgid,command
3 pstree 工具的使用
通过 pstree 可以很直接的看到相同的进程数量,最主要的还是我们可以看到所有进程之间的相关性。
pstree
pstree -up
#参数选择:
#-A :各程序树之间以 ASCII 字元來连接;
#-p :同时列出每个 process 的 PID;
#-u :同时列出每个 process 的所属账户名称。
二、进程的管理
1 kill 命令的掌握
当一个进程结束的时候或者要异常结束的时候,会向其父进程返回一个或者接收一个 SIGHUP 信号而做出的结束进程或者其他的操作,这个 SIGHUP 信号不仅可以由系统发送,我们可以使用 kill 来发送这个信号来操作进程的结束或者重启等等。
之前我们使用 kill 命令来管理我们的一些 job,现在我们将尝试用 kill 来操作下一些不属于 job 范畴的进程,直接对 pid 下手
#首先我们使用图形界面打开了 gedit、gvim,用 ps 可以查看到
ps aux
#使用9这个信号强制结束 gedit 进程
kill -9 1608
#我们再查找这个进程的时候就找不到了
ps aux | grep gedit
3.2 进程的执行顺序
我们在使用 ps 命令的时候可以看到大部分的进程都是处于休眠的状态,如果这些进程都被唤醒,那么该谁最先享受 CPU 的服务,后面的进程又该是一个什么样的顺序呢?进程调度的队列又该如何去排列呢?
当然就是靠该进程的优先级值来判定进程调度的优先级,而优先级的值就是上文所提到的 PR 与 nice 来控制与体现了
The nice value can be modified by us are nice command, but should be noted that nice value can be adjusted from -20 to 19, where the root has the supreme authority, either adjust their processes can also adjust other users program, and all the values can be used, but ordinary users can only own modulation process, and the scope of its use is only 0 to 19, because of a restriction system in order to avoid general users to seize system resources provided
#这个实验在环境中无法做,因为权限不够,可以自己在本地尝试
#打开一个程序放在后台,或者用图形界面打开
nice -n -5 vim &
#用 ps 查看其优先级
ps -afxo user,ppid,pid,stat,pri,ni,time,command | grep vim
We can also use renice to change the priority of processes already exist:
renice -5 pid