Performance analysis under Linux 4: How to start

==Battlefield Analysis==

Performance analysis is often a battlefield analysis, so before we can pick up coffee and slowly think about how to analyze it, we must first talk about our routines on the battlefield.

Field analysis refers to finding problems in a practical environment. When we really need to perform performance analysis, there is usually no chance for you to run the program repeatedly, retry, and so on. Tens of millions of users, millions of online, millions of Socket connections, and database records on T, these scenarios are usually never built after you go home. Moreover, the operation and maintenance engineers at the customer site may be nice, but usually their leaders are not nice. The leader said, let them go home if they can't handle it for another hour, and we will install the backup system...you have to hug your portable and get out of other people's laboratories in a hurry.

So on-site opportunities are very precious. For different sites, you’d better have a set of scripts in hand. No matter what happens, first pull out a set of dmesg, dpkg -l, /proc/cpu, /proc/ingterrupts, /proc/mem, ifconfig, ps -ef -L, /var/log, so that you can analyze everything later, and the on-site data is very important for later analysis. You don't even know how much memory that system has, so you analyze the performance of the strip?

The second step is performance. We usually look at top first, don’t use any interactive mode, just use top -b -n 3 to get a result, at least you can back it up. But this method has a disadvantage, it will not display the distribution of each CPU, my method is to enter in interactive mode (directly run top), then press 1, expand the CPU, then W, write the current configuration in, and then run top -b -n 3.

If there is a spectrum for top, we can probably know where the problem mainly occurs. If the entire system is very idle and the throughput still cannot increase, it means that there is a packet loss somewhere or the bandwidth of the ingress channel is insufficient. Start looking for the packet loss point.

If the delay is too large, go back and draw the scheduling flowchart of the entire package to see which steps are included, and then use ftrace to track these steps.

(Additionally, in the field, if you want to locate the problem of startup speed, readers can consider using strace or ltrace attach to track the problem of startup efficiency)

If there is a high CPU usage, then you need to rely on perf to profile, first check the perf distribution based on time. Look at the problem, by the way, it is best to pack the perf-archive back.

In this way, the on-site work is almost done. Going out and starting to invite the customer's operation and maintenance personnel to eat and drink is similar.

==Offline Analysis==

The first thing to do offline analysis - order a cup of coffee?

Well, that's not important. For me, the most important thing is: write a fucking document for me! This is so important, I don’t know how many times I’ve encountered someone who was confused at the scene, asked me to help, and then ran over and said, “Kenneth, I’ll tell you about our progress”… Tell me about your mother! ! !

Submit the analysis report!

Submit the analysis report! !

Submit the analysis report! ! !

He is not your secretary, okay? The entire performance analysis work is to build a model, guess the bottleneck, compare it with the data, re-sample, re-analyze the bottleneck, correct the design, and re-sample... such a cycle. Without writing documents and constantly integrating the phenomena we see, the whole analysis is like being built on sand. You come to tell me? I'll write a report for you after the lecture, right?

Therefore, our entire analysis process should be a process of continuously recording our revisions to the model, and documentation is the most important part of the entire work.

At the same time, writing documents is also a reminder to save data. Many people don't care much about data recording. They run this and that in the working environment, and then they're done. I don’t know how many things are wasted. Every time I come into contact with the working environment, the first thing I do is to create a directory, put a BRIEF file, write the current time, tester, environment, and reason, and then start data collection. During the process, I don’t overwrite any “slightly useful” original data. This is a basic work skill. Many engineers are unwilling to learn this basic literacy.

How to write documents is not something I want to teach here. This is something you should learn well in middle school Chinese, but I still want to mention a mistake that many engineers often make: a basic principle for judging whether this kind of analysis report is well written is whether you always express your views around "what is the evidence of the bottleneck". This principle is very simple, but many reports are written and forget this. They like to collect all kinds of nice-looking distribution graphs and trend graphs as they write, and then completely forget whether the system has reached the bottleneck and what is the reason for reaching the bottleneck. Many people talked to me for a long time, and I asked, "How do you judge that the pressure can't go up now?", and then he was stupid. In the final analysis, this is a matter of defending the weak. It is better for us to try to make contributions less and do some basic things first.

==Example==

Let's still use my cs program as an example (I will register a new github account to share it later). This example is very simple. It simulates a group of threads to generate data, write it into the queue, and then another group of threads fetches the data to complete the entire calculation process. The calculation is simulated by the heavy_cal function. Our goal is to maximize computational throughput. So, let's first look at the results of the general operation of 4 threads:

This handles 175K tasks per second. But the CPU is still idle. It may be because we have IO when each thread calculates, which leads to poor efficiency. We use more threads (40) to fill up the waiting for these IOs, and the result is very limited improvement:

Simply can't solve this problem, let's look at the data of ftrace:

Did you see it, the cs thread sleeps in less than 5 microseconds, what kind of plane is it?

The function is written like this:

void * pro_routin(void * arg) {
	struct task * tsk = arg;
	int ret;

	srand((intptr_t)tsk->arg);

	while(1) {
		ret = heavy_cal(rand(), n_p_cal);
		en_q(ret);
		marker("yield here");
		yield_method_f();
	}
}

heavy_cal is pure calculation and will not cause meaningless sleep. Marker is protected by spin_lock in the kernel and will not cause sleep. The only possible sleep is yield and en_q() (write queue). We try to clear yield and find that it has no effect. Then there is only doubt about en_q(). We expect provider en_q to write dozens, and then switch to consumer to process dozens more.

But in fact, according to the Linux scheduling algorithm, the consumer will be gradually elevated to an interactive thread (the Linux scheduling algorithm always increases the scheduling priority of processes that can never run out of time slices, making them interactive threads, so that those tasks that are used to deal with the mouse and keyboard can be scheduled first, thereby improving the response speed.

After this modification, the single-core CPU usage rate increases to over 92%, and the processing efficiency increases to 311K. At this time, let's look at the ftrace data again, it looks like this:

We also traced the calls of futex in this trace. We can see a large number of calls of pthread_mutex_unlock, but none of them caused scheduling, and the overall performance improved.

We still have ways to use the remaining time, but this is just an example, so let's stop here.


 

==Summary==

This article introduces the most basic performance analysis process. Later, we will discuss some common analysis models in detail to deepen our understanding of these models.

Guess you like

Origin blog.csdn.net/m0_54437879/article/details/131727029