Article Directory
- System Performance Monitoring(1)
-
- throw a few questions
-
- What is system performance monitoring?
- How to see if the system is busy or not? What indicators to look at?
- What is the meaning of monitoring?
- TOP (highlight)
- interspersed with other knowledge
- In-depth explanation of the TOP command
-
- What are zombie processes and orphan processes?
- **Process: view/kill**
- **How to remove zombie process**
- **function, system call**
- **Context switching---->context Switch and interrupt cooperation (interrupt)**
- Tool summary for viewing processes:
- **kill command**
- Check which process is running on which cpu
System Performance Monitoring(1)
throw a few questions
What is system performance monitoring?
monitor Monitoring
System performance monitoring is a technology that evaluates the performance of a computer system by monitoring various indicators in the computer system (such as CPU utilization, memory usage, disk I/O, etc.). This monitoring can help administrators and developers identify bottlenecks and problems in the system and take appropriate actions to improve the performance of the system. System performance monitoring typically involves using various tools and techniques to collect, analyze, and visualize system metrics.
How to see if the system is busy or not? What indicators to look at?
1.cup —》top
2. memory ----"top
3. Disk (capacity, IO speed (that is, read and write speed))
4. Network (traffic bandwidth) --》100Mb/s, 1000Mb/s
5. Service (process, port) --> ps aux|grep sshd to see the process, ss -anplut|grep sshd to see the port number
What is the meaning of monitoring?
1. Guarantee and maintain the normal operation of the business
2. Prevent accidents before they happen
TOP (highlight)
basic understanding
The top command is a real-time process monitor, which can display the resource usage of each process in the system, including CPU usage, memory usage, number of processes, number of users, load and other information. In the Linux system, the top command is one of the very commonly used tools.
** Basic syntax of the top command **
top [options]
Use of the top command
1. Press q to quit quit
2. Press the number 1 to view the usage rate of each core of the cpu
M: Sort according to the size of the resident memory; memory
P: Sort according to the percentage of CPU usage; processor
[root@sc-mysql-master ~]# top -p 33638 Specify to monitor a process
shift + < or > to sort by other columns
h view help
[root@gaohui shell] top -d 3 指定每次延迟间隔的时间为1.5s
[root@gaohui shell] top -p 1 只看一行进程
**Common parameters of the top command**
- -d seconds: Specifies the update interval of the top command, that is, how often it is updated. The default is 3 seconds.
- -u username: Only display the process information of the specified username.
- -p pid1,pid2,…: Only display the process information of the specified process ID.
- -c: Display the complete command line.
- -H: Display the thread information of the process.
- -i: Do not display idle processes.
- -n number: Set the number of processes to be displayed by the top command.
- -o field: Sort by the specified field, you can use multiple -o parameters for multi-level sorting.
- -b: Run the top command in batch mode, output only once, suitable for writing to files or scripts.
What does %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st mean?
This is the output of a system performance monitor, indicating the CPU usage of the system at a certain point in time (usually 1 second). Specifically, %Cpu0 indicates the usage of CPU0, and us, sy, ni, id, wa, hi, si, and st respectively indicate the usage ratio of CPU time, and the specific meanings are as follows:
- us: Indicates the percentage of time the CPU spends on user programs. That is to say, us represents the percentage of CPU time occupied by the user state program running.
- sy: Indicates the proportion of CPU time spent on kernel programs. That is to say, sy represents the percentage of CPU time occupied by the running of the kernel mode program.
- ni: Indicates the proportion of time the CPU spends on programs on the execution queue with lower priority of the user process. In other words, ni represents the percentage of CPU time occupied by processes with a higher nice value.
- id: Indicates the percentage of CPU idle time. That is, id indicates the percentage of CPU time occupied by the CPU being idle.
- wa: Indicates the percentage of time the CPU spends waiting for I/O to complete. That is, wa represents the percentage of CPU time that the CPU spends waiting for I/O operations to complete.
- hi: Indicates the percentage of time the CPU spends processing hardware interrupts. That is, hi represents the percentage of CPU time spent handling hardware interrupts.
- si: Indicates the percentage of time the CPU spends processing software interrupts. That is, si represents the percentage of CPU time spent handling software interrupts.
- st: Indicates the percentage of time the CPU spends processing scheduling delays caused by virtualization. That is, st represents the percentage of CPU time spent dealing with scheduling delays due to virtualization. In the above output, %Cpu0: 0.0 us, 0.0 sy, 0.0 ni, 100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st means that CPU0 did not execute any user program, kernel program, or nice value within one second Higher process, hardware interrupt, software interrupt and virtualization scheduling latencies, but idle state. Therefore, the value of id is 100.0.
注意,简单来说
:
sy+id is the usage rate of cpu
When the top command counts the cpu usage,
us indicates the cpu resources consumed by user user state processes
sy indicates the cpu resources consumed by the process lock in the system kernel state
When the mysqld process performs read and write operations, it calls system calls, such as: read(), write(), fork() and other system calls, which are the interfaces of the operating system to other application programs, and are also programs that implement certain functions.
Press the number 1 to see how many cores there are
1. Press q to quit quit
2. Press the number 1 to view the usage rate of each core of the cpu
M: Sort according to the size of the resident memory; memory
P: Sort according to the percentage of CPU usage; processor
[root@sc-mysql-master ~]# top -p 33638 Specify to monitor a process
shift + < or > to sort by other columns
h view help
IOPS: number of input output per second The number of reads and writes to the disk per second
involves a concept, process
The process consists of: pcb+code+data data
A PCB is a data structure used in an operating system to manage processes.
PCB is Process Control Block
pcb is the process number
A process (Process) usually consists of the following parts:
- Program code: Refers to the instructions and codes that a process needs to execute, usually stored in a binary executable file.
- Data area: Refers to data such as global variables, static variables, constants, and dynamically allocated memory that the process needs to use.
- Stack area: refers to the memory space required by the function call in the process, mainly used to save information such as function parameters, local variables, and function return values.
- Heap area: refers to the memory space that needs to be dynamically allocated in the process, and is used to store the memory dynamically requested when the process is running.
- Process context: refers to some basic information in the process, such as process ID, process status, process priority, CPU time slice, open file descriptors, environment variables, signals and signal processing methods, etc.
- Resources: Refers to the hardware and software resources used by the process, such as CPU, memory, disk, network, etc. A process is the most basic execution unit in a computer system. Through the multitasking capability of a process, a computer can execute multiple programs at the same time, improving system efficiency and resource utilization.
interspersed with other knowledge
How to see the memory?
[root@gaohui ~] free -m
total used free shared buff/cache available
Mem: 3770 232 3404 11 134 3348
Swap: 2047 0 2047
[root@gaohui ~]#
see network traffic
[root@gaohui ~] yum install dstat -y //安装这个
[root@gaohui ~] dstat -anm //cpu的使用率 磁盘的读写 内存的读写
Terminal width too small, trimming output.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- -net/total->
usr sys idl wai hiq siq| read writ| recv send| in out | int csw | recv send>
0 0 100 0 0 0| 97k 78k| 0 0 | 0 0 | 91 167 | 0 0 >
0 0 100 0 0 0| 0 0 | 60B 994B| 0 0 | 54 85 | 60B 994B>
0 0 100 0 0 0| 0 0 | 60B 394B| 0 0 | 55 90 | 60B 394B>
glances
[root@gaohui ~] yum install glances -y
[root@gaohui ~] yum install epel-release -y //安装了epel源才能用 glances
[root@gaohui ~] glances //输入这个就能看了 全能的命令
view progress
top
ps aux
ps aux |grep sshd //例子
Check the port number
[root@gaohui ~] netstat -anplut
View cup information
[root@gaohui ~] lscpu
[root@gaohui ~] cat /proc/cpuinfo
How to see how many users are currently logged in
press w
In-depth explanation of the TOP command
top - 13:46:54 up 1:17, 3 users, load average: 0.00, 0.01, 0.05
Time 13:46:54 represents the current machine time
up indicates how long it has been since booting up 1:17 ----> an hour and 17 minutes
3 users means that there are currently three users logging in
load average average load
Tasks: 113 total, 1 running, 112 sleeping, 0 stopped, 0 zombie
Tasks represent the process
Analyzing the above sentence, there are a total of 113 processes, 1 running, 112 sleeping, 0 stopping, and 0 zombie processes
running-----"running in the cpu
sleeping ----- "ready and blocking
stopped —” pause/suspend
zombie ----" dead state, zombie process
In the output of the top command, a process in the running state usually shows its CPU usage and memory usage. If a process is in the running state for a long time, it means that it is performing complex computing tasks or IO operations, which may increase the load on the system and requires attention and optimization.
What are zombie processes and orphan processes?
The official explanation is:
When a process completes its work, but its parent process does not call wait() and other functions in time to reclaim the resources of the child process, then the child process will become a zombie process. Zombie processes do not take up CPU resources, but they do take up system memory resources. If there are a large number of zombie processes in a system, the performance of the system will be degraded.
An orphan process is when the parent process terminates unexpectedly or is killed, while its child process continues to run. At this point, the child process becomes an orphan process. The orphan process will be taken over by the init process and become a child process of the init process. The init process periodically checks whether there is an orphan process, and if so, reclaims the resources of the process.
In layman's terms:
The child process exits, and the parent process does not use the wait() system call to reclaim the pcb of the child process. As a result, the information of the child process still consumes memory space in the kernel space, and exists like a zombie.
An orphan process is a process whose parent process has exited or terminated abnormally, but the process is still running. An orphan process has no parent process to manage it, usually the init process takes over. The existence of orphan processes may occupy system resources, so it needs to be dealt with in time to ensure the stability and security of the system.
How do you know which signal types are in the machine?
kill -l
signal signal
Signals: a way to communicate between processes and processes
The role of the hup signal: tell the kernel to kill session-related child processes
process: view/kill
pstree -p
ps to
echo $$
kill -9
How to remove zombie processes
If the parent process does not handle the zombie process in time, you can use the kill command to forcibly terminate the parent process. The format of the command is as follows:
kill -9 <父进程ID>
Use the kill command to send a SIGCHLD signal to the parent process ID. The command format is as follows:
kill -s SIGCHLD <父进程ID>
function, system call
fork()
wait()
exit()
gitpid()
fork(): In Linux, the fork() function is used to create a new process, the new process is called the child process, and the original process is called the parent process. A child process is a copy of the original process, including process memory, context, etc., but has its own independent process ID (PID) and memory space. The fork() function returns twice, once in the parent process to return the PID of the child process, and once in the child process to return 0.
wait(): In Linux, the wait() function is used to wait for the end of the child process and get the exit status of the child process. When a child process ends, it sends a signal to the parent process, and the parent process can catch the signal through the wait() function and get the exit status of the child process. If the parent process calls the wait() function before the child process ends, the parent process will be blocked until the child process ends and returns an exit status.
exit(): In Linux, the exit() function is used to end the current process. Calling the exit() function will cause the resources of the current process to be released, including memory, file descriptors, and so on. At the same time, an exit status code will be sent to the parent process to tell the parent process the execution result of the current process.
getpid(): In Linux, the getpid() function is used to obtain the PID (process ID) of the current process. Each process has a unique PID, which can be used to distinguish different processes. In Linux, PID starts from 1, and process number 1 is the init process, which is the ancestor process of all processes.
load average
load average (load average) is an indicator used to measure system load, usually used in Linux and Unix systems. It refers to the average number of active processes in the system over a period of time, that is, the number of processes that are using the CPU or waiting for CPU time.
load average: 0.00, 0.01, 0.05 0.00, 0.01, 0.05
These three represent the system load for the past 1 minute, 5 minutes and 15 minutes
The average number of processes in the ready and running queues for 1, 5, and 15 minutes
More than 1 process at any time, it means that the CPU is very busy
The load of the past 1,5,15 system, the number of processes in each queue
load average is related to cpu
interrupt
Average load:
The most standard statement:
More than 1 is very busy (1 cpu core)
4 cpu cores 4
32 cpu cores 32
Non-standard statement:
1 cpu core < 5 means that the system is very busy, but it can still receive
4 cpu cores <4*5 <20 means the system is very busy, but can still receive
Process consumes cpu, memory, disk IO, network IO
Context switching---->context Switch and interrupt cooperation (interrupt)
Interrupts are divided into soft interrupts and hard interrupts
switch in cpu
go out before you can come in
1 process goes in and one goes out.
Context switching means that in the operating system, when the CPU switches from one process or thread to another, it needs to save the state of the current process or thread (also called context), and load the state of the next process or thread, to be able to continue execution of the process or thread. The context includes registers, stack pointers, program counters, etc., which record the current execution status of the process or thread.
When the operating system needs to switch processes or threads, it needs to save the context of the current process or thread and load the context of the next process or thread. This process is called context switching. Context switching is an expensive operation because a large amount of state information needs to be saved and restored, resulting in reduced CPU utilization. Therefore, reducing context switching is an important means to optimize operating system performance.
Tool summary for viewing processes:
[root@gaohui ~] ps -o ppid,user,pid,command
PPID USER PID COMMAND
2746 root 2748 -bash
2748 root 2896 ps -o ppid,user,pid,command
[root@gaohui ~]#
top
ps to
ps -ef
ps -o ppid,user,pid,command
pstree
kill
kill command
kill can only be killed according to the signal + process number
killwall + signal + name to kill by name
pkill can kill according to the terminal number
pkill -t pts/3 -9
Check which process is running on which cpu
top---->f---->P->space selection -q
ps -eo pid,%cpu,%mem,psr,command