System Performance Monitoring(1)

System Performance Monitoring(1)

throw a few questions

What is system performance monitoring?

monitor Monitoring
System performance monitoring is a technology that evaluates the performance of a computer system by monitoring various indicators in the computer system (such as CPU utilization, memory usage, disk I/O, etc.). This monitoring can help administrators and developers identify bottlenecks and problems in the system and take appropriate actions to improve the performance of the system. System performance monitoring typically involves using various tools and techniques to collect, analyze, and visualize system metrics.

How to see if the system is busy or not? What indicators to look at?

1.cup —》top

2. memory ----"top

3. Disk (capacity, IO speed (that is, read and write speed))

4. Network (traffic bandwidth) --》100Mb/s, 1000Mb/s

5. Service (process, port) --> ps aux|grep sshd to see the process, ss -anplut|grep sshd to see the port number

What is the meaning of monitoring?

1. Guarantee and maintain the normal operation of the business

2. Prevent accidents before they happen

TOP (highlight)

basic understanding

The top command is a real-time process monitor, which can display the resource usage of each process in the system, including CPU usage, memory usage, number of processes, number of users, load and other information. In the Linux system, the top command is one of the very commonly used tools.

** Basic syntax of the top command **
top [options]
Use of the top command

1. Press q to quit quit

2. Press the number 1 to view the usage rate of each core of the cpu

M: Sort according to the size of the resident memory; memory

P: Sort according to the percentage of CPU usage; processor

[root@sc-mysql-master ~]# top -p 33638 Specify to monitor a process

shift + < or > to sort by other columns

h view help

[root@gaohui shell] top -d 3 指定每次延迟间隔的时间为1.5s

[root@gaohui shell] top -p 1 只看一行进程
**Common parameters of the top command**
  • -d seconds: Specifies the update interval of the top command, that is, how often it is updated. The default is 3 seconds.
  • -u username: Only display the process information of the specified username.
  • -p pid1,pid2,…: Only display the process information of the specified process ID.
  • -c: Display the complete command line.
  • -H: Display the thread information of the process.
  • -i: Do not display idle processes.
  • -n number: Set the number of processes to be displayed by the top command.
  • -o field: Sort by the specified field, you can use multiple -o parameters for multi-level sorting.
  • -b: Run the top command in batch mode, output only once, suitable for writing to files or scripts.
What does %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st mean?

This is the output of a system performance monitor, indicating the CPU usage of the system at a certain point in time (usually 1 second). Specifically, %Cpu0 indicates the usage of CPU0, and us, sy, ni, id, wa, hi, si, and st respectively indicate the usage ratio of CPU time, and the specific meanings are as follows:

  • us: Indicates the percentage of time the CPU spends on user programs. That is to say, us represents the percentage of CPU time occupied by the user state program running.
  • sy: Indicates the proportion of CPU time spent on kernel programs. That is to say, sy represents the percentage of CPU time occupied by the running of the kernel mode program.
  • ni: Indicates the proportion of time the CPU spends on programs on the execution queue with lower priority of the user process. In other words, ni represents the percentage of CPU time occupied by processes with a higher nice value.
  • id: Indicates the percentage of CPU idle time. That is, id indicates the percentage of CPU time occupied by the CPU being idle.
  • wa: Indicates the percentage of time the CPU spends waiting for I/O to complete. That is, wa represents the percentage of CPU time that the CPU spends waiting for I/O operations to complete.
  • hi: Indicates the percentage of time the CPU spends processing hardware interrupts. That is, hi represents the percentage of CPU time spent handling hardware interrupts.
  • si: Indicates the percentage of time the CPU spends processing software interrupts. That is, si represents the percentage of CPU time spent handling software interrupts.
  • st: Indicates the percentage of time the CPU spends processing scheduling delays caused by virtualization. That is, st represents the percentage of CPU time spent dealing with scheduling delays due to virtualization. In the above output, %Cpu0: 0.0 us, 0.0 sy, 0.0 ni, 100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st means that CPU0 did not execute any user program, kernel program, or nice value within one second Higher process, hardware interrupt, software interrupt and virtualization scheduling latencies, but idle state. Therefore, the value of id is 100.0.

注意,简单来说:
sy+id is the usage rate of cpu

When the top command counts the cpu usage,

us indicates the cpu resources consumed by user user state processes

sy indicates the cpu resources consumed by the process lock in the system kernel state

When the mysqld process performs read and write operations, it calls system calls, such as: read(), write(), fork() and other system calls, which are the interfaces of the operating system to other application programs, and are also programs that implement certain functions.

Press the number 1 to see how many cores there are

1. Press q to quit quit

2. Press the number 1 to view the usage rate of each core of the cpu

M: Sort according to the size of the resident memory; memory

P: Sort according to the percentage of CPU usage; processor

[root@sc-mysql-master ~]# top -p 33638 Specify to monitor a process

shift + < or > to sort by other columns

h view help

IOPS: number of input output per second The number of reads and writes to the disk per second

involves a concept, process

The process consists of: pcb+code+data data

A PCB is a data structure used in an operating system to manage processes.

PCB is Process Control Block

pcb is the process number

A process (Process) usually consists of the following parts:

  1. Program code: Refers to the instructions and codes that a process needs to execute, usually stored in a binary executable file.
  2. Data area: Refers to data such as global variables, static variables, constants, and dynamically allocated memory that the process needs to use.
  3. Stack area: refers to the memory space required by the function call in the process, mainly used to save information such as function parameters, local variables, and function return values.
  4. Heap area: refers to the memory space that needs to be dynamically allocated in the process, and is used to store the memory dynamically requested when the process is running.
  5. Process context: refers to some basic information in the process, such as process ID, process status, process priority, CPU time slice, open file descriptors, environment variables, signals and signal processing methods, etc.
  6. Resources: Refers to the hardware and software resources used by the process, such as CPU, memory, disk, network, etc. A process is the most basic execution unit in a computer system. Through the multitasking capability of a process, a computer can execute multiple programs at the same time, improving system efficiency and resource utilization.

interspersed with other knowledge

How to see the memory?

[root@gaohui ~] free -m
              total        used        free      shared  buff/cache   available
Mem:           3770         232        3404          11         134        3348
Swap:          2047           0        2047
[root@gaohui ~]# 
see network traffic
[root@gaohui ~] yum install dstat -y   //安装这个
[root@gaohui ~] dstat -anm  //cpu的使用率  磁盘的读写  内存的读写
Terminal width too small, trimming output.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- -net/total->
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | recv  send>
  0   0 100   0   0   0|  97k   78k|   0     0 |   0     0 |  91   167 |   0     0 >
  0   0 100   0   0   0|   0     0 |  60B  994B|   0     0 |  54    85 |  60B  994B>
  0   0 100   0   0   0|   0     0 |  60B  394B|   0     0 |  55    90 |  60B  394B>
glances
[root@gaohui ~] yum install glances -y
[root@gaohui ~] yum install epel-release -y  //安装了epel源才能用 glances
[root@gaohui ~] glances //输入这个就能看了 全能的命令
view progress
top
ps aux
ps aux |grep sshd  //例子
Check the port number
[root@gaohui ~] netstat -anplut
View cup information
[root@gaohui ~] lscpu
[root@gaohui ~] cat /proc/cpuinfo
How to see how many users are currently logged in

press w

In-depth explanation of the TOP command

top - 13:46:54 up  1:17,  3 users,  load average: 0.00, 0.01, 0.05
  1. Time 13:46:54 represents the current machine time

  2. up indicates how long it has been since booting up 1:17 ----> an hour and 17 minutes

  3. 3 users means that there are currently three users logging in

  4. load average average load


Tasks: 113 total,   1 running, 112 sleeping,   0 stopped,   0 zombie

Tasks represent the process

Analyzing the above sentence, there are a total of 113 processes, 1 running, 112 sleeping, 0 stopping, and 0 zombie processes
running-----"running in the cpu

sleeping ----- "ready and blocking

stopped —” pause/suspend

zombie ----" dead state, zombie process

In the output of the top command, a process in the running state usually shows its CPU usage and memory usage. If a process is in the running state for a long time, it means that it is performing complex computing tasks or IO operations, which may increase the load on the system and requires attention and optimization.

What are zombie processes and orphan processes?

The official explanation is:

When a process completes its work, but its parent process does not call wait() and other functions in time to reclaim the resources of the child process, then the child process will become a zombie process. Zombie processes do not take up CPU resources, but they do take up system memory resources. If there are a large number of zombie processes in a system, the performance of the system will be degraded.

An orphan process is when the parent process terminates unexpectedly or is killed, while its child process continues to run. At this point, the child process becomes an orphan process. The orphan process will be taken over by the init process and become a child process of the init process. The init process periodically checks whether there is an orphan process, and if so, reclaims the resources of the process.

In layman's terms:

The child process exits, and the parent process does not use the wait() system call to reclaim the pcb of the child process. As a result, the information of the child process still consumes memory space in the kernel space, and exists like a zombie.

An orphan process is a process whose parent process has exited or terminated abnormally, but the process is still running. An orphan process has no parent process to manage it, usually the init process takes over. The existence of orphan processes may occupy system resources, so it needs to be dealt with in time to ensure the stability and security of the system.

How do you know which signal types are in the machine?

kill -l

signal signal

Signals: a way to communicate between processes and processes

The role of the hup signal: tell the kernel to kill session-related child processes

process: view/kill

pstree -p

ps to

echo $$

kill -9

How to remove zombie processes

If the parent process does not handle the zombie process in time, you can use the kill command to forcibly terminate the parent process. The format of the command is as follows:

kill -9 <父进程ID>

Use the kill command to send a SIGCHLD signal to the parent process ID. The command format is as follows:

kill -s SIGCHLD <父进程ID>

function, system call

fork()

wait()

exit()

gitpid()

fork(): In Linux, the fork() function is used to create a new process, the new process is called the child process, and the original process is called the parent process. A child process is a copy of the original process, including process memory, context, etc., but has its own independent process ID (PID) and memory space. The fork() function returns twice, once in the parent process to return the PID of the child process, and once in the child process to return 0.

wait(): In Linux, the wait() function is used to wait for the end of the child process and get the exit status of the child process. When a child process ends, it sends a signal to the parent process, and the parent process can catch the signal through the wait() function and get the exit status of the child process. If the parent process calls the wait() function before the child process ends, the parent process will be blocked until the child process ends and returns an exit status.

exit(): In Linux, the exit() function is used to end the current process. Calling the exit() function will cause the resources of the current process to be released, including memory, file descriptors, and so on. At the same time, an exit status code will be sent to the parent process to tell the parent process the execution result of the current process.

getpid(): In Linux, the getpid() function is used to obtain the PID (process ID) of the current process. Each process has a unique PID, which can be used to distinguish different processes. In Linux, PID starts from 1, and process number 1 is the init process, which is the ancestor process of all processes.

load average

load average (load average) is an indicator used to measure system load, usually used in Linux and Unix systems. It refers to the average number of active processes in the system over a period of time, that is, the number of processes that are using the CPU or waiting for CPU time.

load average: 0.00, 0.01, 0.05 0.00, 0.01, 0.05

These three represent the system load for the past 1 minute, 5 minutes and 15 minutes

The average number of processes in the ready and running queues for 1, 5, and 15 minutes

More than 1 process at any time, it means that the CPU is very busy

The load of the past 1,5,15 system, the number of processes in each queue

load average is related to cpu

interrupt

Average load:

​ The most standard statement:

​ More than 1 is very busy (1 cpu core)

​ 4 cpu cores 4

​ 32 cpu cores 32

​ Non-standard statement:

​ 1 cpu core < 5 means that the system is very busy, but it can still receive

​ 4 cpu cores <4*5 <20 means the system is very busy, but can still receive

Process consumes cpu, memory, disk IO, network IO

Context switching---->context Switch and interrupt cooperation (interrupt)

Interrupts are divided into soft interrupts and hard interrupts

switch in cpu

go out before you can come in

1 process goes in and one goes out.

Context switching means that in the operating system, when the CPU switches from one process or thread to another, it needs to save the state of the current process or thread (also called context), and load the state of the next process or thread, to be able to continue execution of the process or thread. The context includes registers, stack pointers, program counters, etc., which record the current execution status of the process or thread.

When the operating system needs to switch processes or threads, it needs to save the context of the current process or thread and load the context of the next process or thread. This process is called context switching. Context switching is an expensive operation because a large amount of state information needs to be saved and restored, resulting in reduced CPU utilization. Therefore, reducing context switching is an important means to optimize operating system performance.

Tool summary for viewing processes:

[root@gaohui ~] ps -o ppid,user,pid,command
  PPID USER        PID COMMAND
  2746 root       2748 -bash
  2748 root       2896 ps -o ppid,user,pid,command
[root@gaohui ~]# 

top

ps to

ps -ef

ps -o ppid,user,pid,command

pstree

kill

kill command

kill can only be killed according to the signal + process number

killwall + signal + name to kill by name

pkill can kill according to the terminal number

pkill -t pts/3 -9

Check which process is running on which cpu

top---->f---->P->space selection -q

ps -eo pid,%cpu,%mem,psr,command

Guess you like

Origin blog.csdn.net/investor_/article/details/130785600