Performance Tuning Series: Numbers Every Programmer Should Know

content

foreword

text

The purpose of looking at this data

1) The CPU is very, very fast

2) Memory is fast, but still too slow compared to CPU

3) Disk performance is very, very slow

4) Disk sequential I/O is much faster than random read I/O

5) Network transmission is also time-consuming, basically at the millisecond level

Summarize

finally

Recommended reading


foreword

The most frequently heard sentence in the exchange group is: My project is too rubbish, what should I do in the interview. To be honest, I felt the same way when I heard it, because that's how I came here too. And, almost most people go through this process. So, without further ado, arrange.

There are several reasons for choosing the direction of performance optimization:

1) Performance optimization is a direction I am very interested in. At the same time, in the past few years, I have accumulated some experience in this direction. It can be said that in my previous interview, the project almost relied on performance optimization to travel all over the world. Therefore, I think I can take it out and share it with you.

2) Performance optimization is very general and can be applied to almost all online projects. After you master it, you can immediately put it into practice in the project. I think, there should be no projects that don't need performance optimization, unless your project only has "Hello world".

3) Most of the content of performance optimization is very simple, with almost no threshold, and students with less experience can easily get started. At the same time, performance optimization also applies the 28 principle: mastering 20% ​​of the content is enough to solve 80% of the problems.

4) Performance optimization is easy to get results. Students with a little experience should know that they are most afraid of not getting results when they meet their needs. Performance optimization is different. They are all very straightforward numbers. 1 hour task, I optimized it to 5 minutes, the performance improvement is ten times, simple and rude.

Not much to say, let's talk.

text

The title of the article comes from a speech by Jeff Dean on distributed systems within Google, with the English title: Numbers Everyone Should Know.

These numbers are closely related to our subsequent performance optimization, so I put this part in the first article to help you establish basic performance concepts.

Let's take a look at what numbers Jeff Dean is talking about:

Note: 1μs = 1000ns, 1ms = 1000μs

operate

time consuming/delay

Time-consuming * 1 billion

L1 cache read (L1)

0.5ns

0.5s

branch misprediction

5ns

5s

L2 cache read (L2)

7ns

7s

Mutex lock unlock

25ns

25s

memory addressing

100ns

100s

Zippy compresses 1KB data

3000ns(3μs) 50min

Send 1KB of data over a 1Gbps network

10,000ns(10μs)

2.8h

Randomly read 4KB data from SSD (1GB/s)

150,000ns(150μs) 1.7days

Read 1MB of data sequentially from memory

250,000ns(250μs)

2.9days
Packets make a round trip to the same data center 500,000ns(500μs) 5.8days

Read 1MB data sequentially from SSD (1GB/s)

1,000,000ns(1ms) 11.6days

disk seek

10,000,000ns(10ms)

3.8months

Read 1MB of data sequentially from disk

20,000,000ns(20ms)

7.9months

One round trip in packets from the US to the Netherlands

150,000,000ns(150ms)

4.75years

The third column of the table increases the time-consuming data by a billion times and converts it into units that are easier for everyone to see.

The original source for this data was Peter Norvig's article: Teach Yourself Programming in Ten Years, available at http://norvig.com/21-days.html.

Based on this data, Colin Scott of Berkeley made a website that can change over time through a certain algorithm. The address is: https://colin-scott.github.io/personal_website/research/interactive_latency.html, The comments in the source code explain the calculation logic in detail. For example, the network bandwidth is doubled every two years, and the DRAM bandwidth is doubled every three years.

According to Colin Scott's chart, by 2021, network bandwidth, memory, SSD, and disk will all increase by orders of magnitude, while CPU-related primary and secondary caches will not change much. If you are interested, you can click in and take a look.

The purpose of looking at this data

First of all, these data are definitely not completely accurate. Due to the influence of many environmental factors, it is actually difficult to have so-called accurate figures.

We look at these data more to understand the time-consuming magnitude of each operation and the magnitude ratio between each operation, so as to have a preliminary concept of some related knowledge that we have come into contact with in our work.

And I put this data at the beginning of the performance optimization series of articles, mainly want to convey a few concepts to the students:

1) The CPU is very, very fast

It only takes 1 clock cycle for the CPU to execute most simple instructions. When I tested it with a personal computer, the CPU can turbo up to 4.40GHz (see the test chart at point 2), which means that the time required to execute a simple instruction at this time is about It is 1/4.4ns, which is 0.23ns (nanoseconds).

What is this concept? To give a simple example, even light propagating in a vacuum can only travel less than 7 centimeters within 0.23ns.

2) Memory is fast, but still too slow compared to CPU

The bottleneck between CPU and memory is often called the von Neumann bottleneck. How big the difference is, I did a simple test with my own computer.

I just bought my computer this year, the hardware should be relatively new, but the configuration is relatively common, for reference only.

The CPU configuration is 11th Gen Intel Core [email protected], the turbo frequency is 4.40GHz, and the test results show that it really runs to 4.40GHz, and the memory configuration is DDR4 3200MHz.

The test results are shown in the following figure:

From the above picture, the read speed of the memory is 41GB/s, which is quite fast, but the L1 Cache is 3TB/s, and the difference is still very large.

If the CPU is calculated at 4.40GHz, the time required to execute a simple instruction is about 0.23ns (nanoseconds), and the delay of the memory is 88.7ns, which is equivalent to the CPU fetching a byte from the memory, and it needs to wait 386 cycles. It can be seen that the memory is indeed too slow compared to the CPU.

This is also the reason why L1, L2, and L3 caches are introduced. However, we will not study these things in depth here, but only have a general idea of ​​the performance gap between CPU and memory.

3) Disk performance is very, very slow

Everyone probably knows how slow it is. I am doing a simple test on my own computer.

My computer happens to have two hard drives, a 256GB SSD (solid state drive) and a 1T HDD (mechanical hard drive).

The SSD test results are shown in the following figure:

Ignoring the effects of queue (Q) and thread (T), the performance of sequential read (SEQ) is 1535.67MB/s, and the performance of random read (RND) is 49.61MB/s.

In comparison, the performance of the above memory is 41GB/s. Although it is SSD, there is still an order of magnitude gap in performance. Another is that the performance of random read is also an order of magnitude gap compared to sequential read.

The HDD test results are shown in the following figure:

Neglecting the effects of queue (Q) and thread (T), the performance of sequential read (SEQ) is 183.49MB/s, and the performance of random read (RND) is 0.6MB/s.

Compare the performance of the above SSDs: sequential reads are 1535.67 vs 183.49, there is an order of magnitude gap, and random reads are 49.61 vs 0.6, there is a two order of magnitude gap.

The performance gap between HDD sequential read and random read is more serious than that of SSD, about 300 times. It's terrible, but I believe that the current servers should basically be SSDs. If you find that the disk of your company server is still HDD, then go away.

4) Disk sequential I/O is much faster than random read I/O

This is also seen in the above test, which is an order of magnitude gap, especially on the previous HDD. There are many technologies that take advantage of the good performance of sequential I/O to improve performance. Typical examples are: Kafka writes messages sequentially, LSM-Tree used at the bottom of Leveldb and RocksDB, etc.

5) Network transmission is also time-consuming, basically at the millisecond level

As you can see in the table at the beginning, a round trip to the same data center takes 0.5ms.

If it is a cross-city, it will take longer. I believe it is not difficult to understand. After all, the signal has to climb along the network cable. The farther the distance is, the longer it will take.

The figure below shows the time it takes for PING operations from Shanghai to some cities. It can be seen that Zhangjiakou already takes about 30ms, which is almost the delay for going north.

This is why we usually use the same-room-first and same-center-first strategy in the server's routing strategy.

This reminds me of a problem I encountered before. At that time, a new service was being tested. There was basically no data in the database. The test scenario was also very simple to add, delete, modify, and query. However, the performance of the interface was very poor, and it took hundreds of milliseconds at every turn.

After carefully looking at the call chain, I found that each DB operation takes about 30ms. After looking at the distribution of the computer room, I found that the application server and the database server are located in different cities, one in Beijing and the other in Shanghai, resulting in a fixed delay of about 30ms. . After changing the two to the same computer room, it is basically 1ms.

Summarize

This article focuses on some core concepts that business development needs to master in performance optimization. The reason why I introduce it first is because in the process of performance optimization, I found that most performance problems are caused by network I/O and Caused by disk I/O. Knowing these concepts will help us locate performance bottlenecks faster and solve problems faster.

Some students may ask, when will the next article be, won’t it be two months later?

A: Let me talk to you about why the update interval is so long. One is that I have become lazy. There is no doubt about this. I have forgotten about some other things. For example, I have been exercising every day since a few months ago. Of course, in the final analysis there are A big reason is the power issue.

The other is that work is really busy, and the projects I am doing now are more challenging, basically hundreds of thousands of millions of QPS, billions of data volumes, many problems are not the same, so there are many things to do, The challenges are relatively large, so it takes more time to think and maintain.

Then, the reason is this, many students have the habit of prostitution, I have no objection to this, because I myself often prostitute. But if you really think this series of articles is helpful to you, and you want to speed up the update, the best way is to give me some feedback, what you know with one click.

I need feedback to know if the article is helpful to everyone. If there are more good feedbacks, I know that the article is very helpful to everyone, so I will update it more diligently. I might be able to do it all night. I am afraid myself.

finally

I am Jon Hui, a programmer who insists on sharing original technical dry goods .

Recommended reading

Java basic high-frequency interview questions (the latest version in 2021)

Java Collection Framework high-frequency interview questions (2021 latest version)

Interview must ask MySQL, do you understand?

The thread pool that must be asked in the interview, do you understand?

Guess you like

Origin blog.csdn.net/v123411739/article/details/120817579