The most complete network, performance test - performance analysis CPU usage case (detailed)


foreword

A case where the frequency of soft interrupts is too high

The system configuration is
Ubuntu 18.04, 2 CPUs, 2GB memory, and a total of two virtual machines

Three tools
sar: It is a system activity reporting tool, which can not only view the current activity of the system in real time, but also configure, save and report historical statistics.
hping3: It is a tool that can construct TCP/IP protocol data packets, and can conduct security audits and firewall tests on the system.
tcpdump: is a commonly used network packet capture tool, often used to analyze various network problems

Run the case through docker

Execute command in VM1

docker run -itd --name=nginx -p 80:80 nginx

Confirm that Nginx starts normally through curl

Execute the command in VM2

curl http://172.20.72.58/

Simulate Nginx client requests through hping3

Execute the command in VM2

hping3 -S -p 80 -i u100 172.20.72.58

-S: The parameter means to set the SYN (synchronous sequence number) of the TCP protocol
-p: means the destination port is 80
-i: u100 means to send a network frame every 100 microseconds

Going back to VM1,
I feel that the system response has slowed down significantly. Even if I just hit a few carriage returns in the terminal, it takes a long time to get a response.

Analyze why the system responds slowly.
The following commands are all executed in VM1

Use the top command to view system resource usage

Please add a picture description

The CPU usage of the system (user state us and kernel state sy) is not high;
the average load is moderate, there are only 2 processes in R state, and no zombie process;
but the CPU usage of soft interrupt process No. 1 (ksoftirqd/1) is high , and the proportion of CPU processing soft interrupts has reached 94;
in addition, there are no other abnormal processes;
it can be guessed that soft interrupts are the culprit;

Confirm what type of soft interrupt is.
Observing the contents of the /proc/softirqs file, you can know the number of soft interrupt types

The number of various soft interrupts here, and the number of times in what time period?
It is the cumulative number of interruptions since the system was running
, so if you directly view the contents of the file, you will only get the cumulative number of interruptions, which has no direct reference to the problem here. The
change rate of the number of interruptions is what we need to pay attention to.

Dynamically view the command output through watch
because my machine has two cores, if I read /proc/softirqs directly, it will print the information of 128 cores, but for me, it is enough to read the information of the first two cores, so I need to write and extract the key data

watch -d "/bin/cat /proc/softirqs | /usr/bin/awk 'NR == 1{printf \"%-15s %-15s %-15s\n\",\" \",\$1,\$2}; NR > 1{printf \"%-15s %-15s %-15s\n\",\$1,\$2,\$3}'"

Please add a picture description

Result analysis
TIMER (timing interrupt), NET_RX (network reception), SCHED (kernel scheduling), RCU (RCU lock) and other soft interrupts are constantly changing; and NET_RX is the change rate of the
network packet receiving soft interrupt The fastest;
several other types of soft interrupts are necessary to ensure the normal operation of Linux scheduling, clocks, and critical section protection, so it is normal when there are changes;

Check the network sending and receiving status of the system through sar

It is confirmed above to start with the soft interrupt received by the network, so the first step should be to check the network reception of the system

The benefits of sar
Not only can observe the throughput of the network sending and receiving (BPS, the number of bytes sent and received per second),
but also the PPS of the network sent and received (the number of network frames sent and received per second)

Execute the sar command

sar -n DEV 1

Please add a picture description

The second column: IFACE indicates the network card
The third and fourth columns: rxpck/s and txpck/s respectively indicate the number of network frames received and sent per second [PPS] The
fifth and sixth columns: rxkB/s and txkB/s respectively indicate the number of network frames per second The number of kilobytes received and sent per second [BPS]

Result analysis
For network card ens33

The number of network frames received per second is relatively large, almost reaching 8w, while the number of network frames sent is small, only close to 4w; the
number of kilobytes received per second is only 4611 KB, and the number of kilobytes sent is even smaller, only 2314 KB;

The data of docker0 and veth04076e3
is basically the same as that of ens33, but the sending and receiving are opposite. The sent data is larger and the received data is smaller.
This is caused by the forwarding of the Linux internal bridge. Don’t go into it for now, as long as you know that this is the package received by ens33 by the system Just forward it to the Nginx service

Abnormal point
As mentioned earlier, it is the problem of network data packet reception soft interrupt, then focus on
the PPS received by ens33 reaches 8w, but the received BPS is only less than 5k, the network frame seems to be relatively small
4611 * 1024 / 78694 = 64 bytes, indicating that the average network frame is only 60 bytes, which is obviously a very small network frame, which is often referred to as the small packet problem

Soul torture
How to know what kind of network frame it is, and where did it come from?

Grab network packets through tcpdump

Known conditions
Nginx listens on port 80, and the HTTP service it provides is based on the TCP protocol

Execute the tcpdump command

tcpdump -i ens33 -n tcp port 80

-i ens33: only grab ens33 network card
-n: do not parse the protocol name and host name
tcp port 80: means only grab the network frame with tcp protocol and port number 80

Please add a picture description

172.20.72.59.52195 > 172.20.72.58.80
means that the network frame is sent from port 52195 of 172.20.72.59 to port 80 of 172.20.72.58,
that is, the network frame is sent from port 52195 of the machine running hping3, and the destination is port 80 of the machine where Nginx is located

Flags [S]
indicates that this is a SYN packet

Performance analysis results
Combined with the phenomenon that the PPS found by the sar command is close to 4w, it can be considered that this is a SYN FLOOD attack sent from the address 172.20.72.59

Solve the SYN FLOOD problem
Block the source IP from the switch or hardware firewall, so that the SYN FLOOD network frame will not be sent to the server

overall idea of ​​analysis

The system freezes, and the response will slow down when executing commands;
check the system resources through top;
find that the CPU usage (us and sy) is not high, the average load is moderate, and there is no running process that exceeds the number of CPU cores. There is no zombie process;
however, it is found that the proportion of CPU processing soft interrupts (si) is relatively high, and the CPU usage of the soft interrupt process can also be seen in the process list, which is guessed to be the main reason why the soft interrupt causes the system to freeze; pass
/ proc/sorfirqs Check the soft interrupt type and change frequency, and find that if you cat directly, it will print the information of 128 cores, but only want the information of two cores; so combine with
awk to filter, and then use the watch command to dynamically output the viewing results;
found There are multiple types of soft interrupts changing. The key point is that the frequency of NET_RX changes is extremely high, and the range is also large. It is a soft interrupt for receiving network data packets. For the time being, it is considered to be the root of the problem; since it has something to do with the network, you can use the sar command
first Check the overall situation of system network reception and transmission;
then you can see that the received PPS will be much larger than the received BPS. After doing the calculation, it is found that the network frame will be very small, which is often called the small packet problem; next, use tcpdump to
capture Take the tcp protocol network packet of port 80, and you will find a large number of SYN packets sent from VM2. Combined with the sar command, it is confirmed that it is a SYN FLOOD attack;

The following is the most complete software test engineer learning knowledge architecture system diagram in 2023 that I compiled

1. From entry to mastery of Python programming

Please add a picture description

2. Interface automation project actual combat

Please add a picture description

3. Actual Combat of Web Automation Project

Please add a picture description

4. Actual Combat of App Automation Project

Please add a picture description

5. Resume of first-tier manufacturers

Please add a picture description

6. Test and develop DevOps system

Please add a picture description

7. Commonly used automated testing tools

Please add a picture description

Eight, JMeter performance test

Please add a picture description

9. Summary (little surprise at the end)

As long as you have a dream in your heart, no matter how difficult it is, if you persevere, you will surely reap the joy of success. Every effort is a seed, sown in the future, watered with sweat, and brilliant flowers will eventually bloom. Believe in yourself, go forward bravely, struggle is the power to create brilliance.

Every effort is an accumulation, and every contribution is a kind of growth. Only by persisting in struggle and continuous pursuit can dreams become reality. Believe in yourself, go forward bravely, as long as you are willing to pay, success will meet you unexpectedly!

Only with unremitting efforts can we create brilliance; only with the courage to forge ahead can we climb the peak; only with perseverance can we pursue our dreams. Believe in yourself, go forward bravely, success will belong to you!

Guess you like

Origin blog.csdn.net/shuang_waiwai/article/details/131438800