Understand high performance and high concurrency from the root (2): Deepen the operating system, understand I/O and zero-copy technology

1. Introduction to the series

1.1 Purpose of the article

As a developer of instant messaging technology, the technical concepts related to high performance and high concurrency have long been understood. What thread pool, zero copy, multiplexing, event-driven, epoll, etc. are all at your fingertips, maybe you Familiar with technical frameworks with these technical characteristics such as: Java's Netty , Php's workman , Go's nget, etc. But when it comes to face-to-face or technical practice, when you encounter unresolved doubts, you know that what you have is just the skin.

Return to the basics and return to the essence, what are the underlying principles behind these technical features? How to understand the principles behind these technologies in an easy-to-understand and effortless way is exactly what the series of articles "Understanding High Performance and High Concurrency from the Root" will share.

1.2 Origin of the article

I have compiled a lot of resources and articles related to IM, message push and other instant messaging technologies, from the initial open source IM framework MobileIMSDK to the online version of the classic network programming masterpiece " TCP/IP Detailed Explanation ", and then to the IM development programmatic. The article "One entry is enough for beginners: Develop mobile IM from scratch ", and " Introduction to Network Programming for Lazy People ", " Introduction to Brain Disabled Network Programming ", " High-performance Network Programming ", " Not for Known Network Programming " series of articles.

The more you go to the depths of knowledge, the more you feel that you know too little about instant messaging technology. So later, in order to allow developers to better understand the characteristics of networks (especially mobile networks) from the perspective of basic telecommunications technology, I collected and compiled a series of high-level articles on the " Introduction to Zero-Basic Communication Technology for IM Developers " across disciplines . This series of articles is already the knowledge boundary of network communication technology for ordinary instant messaging developers. With these network programming materials before, it is basically enough to solve the knowledge blind spots in network communication.

For the development of instant messaging systems such as IM, knowledge of network communication is indeed very important, but it returns to the essence of technology to realize these technical characteristics of network communication itself: including the thread pool, zero copy, multiplexing, and multiplexing mentioned above. Event-driven, etc., what is their nature? What is the underlying principle? This is the purpose of organizing this series of articles, I hope it will be useful to you.

1.3 Article directory

1.4 Overview of this article

Following the previous article " Deep into the bottom of the computer, understanding threads and thread pools ", this article is the second article in a series of high performance and high concurrency, here we come to the topic of I/O. Have you ever wondered what happens at the bottom of the computer when we perform file I/O and network I/O operations? I/O is extremely important for computers, and this article will bring you the answer to this question.

2. The author of this article

At the request of the author, no real names or personal photos are provided.

The main technical direction of the author of this article is the Internet backend, high-concurrency and high-performance server, and search engine technology. The screen name is "Coder's Deserted Island Survival" and the public account "Coder 's Deserted Island Survive ". Thank the author for his selfless sharing.

3. What is the computer that cannot perform I/O?

I believe that I/O operations are the most familiar to programmers, such as:

  • 1) When we use printf in C language, "<<" in C++, print in Python, System.out.println in Java, etc.;
  • 2) When we use various languages ​​to read and write files;
  • 3) When we communicate with the network via TCP/IP;
  • 4) When we use the mouse to fly and dance;
  • 5) When we pick up the keyboard and give pointers in the comment area or we work hard to create bugs;
  • 6) When we can see the beautiful graphical interface on the screen and so on.

All of the above is I/O!

Think about it: If there is no I/O computer, what a boring device it would be. You can't watch movies, play games, or surf the Internet. Such a computer is at best a large calculator.

Since I/O is so important, what exactly is I/O?

4. What is I/O?

I/O is a simple data copy, nothing more!

this point is very important!

Since it is copying data, where is the copy from where?

If the data is copied from an external device to the memory, this is Input.

If the data is copied from the memory to the external device, this is Output.

Copying data back and forth between memory and external devices is Input and Output, referred to as I/O (Input/Output), nothing more.

5, I / O given CPU

Now that we know what I/O is, the next step is the important part. Pay attention, and sit down.

We know that the current CPU's main frequency starts at a few GHz. What does this mean?

To put it simply: the CPU executes machine instructions at the nanosecond level, and the usual I/O such as disk operations, a disk seek is about milliseconds, so if we compare the CPU speed to a fighter jet, then I/O The speed of operation is KFC.

That is to say, when our program runs (CPU executes machine instructions), its speed is much faster than I/O speed. So the next question is that the speed difference between the two is so big, so how do we design and how to use system resources more reasonably and efficiently?

Since there is a speed difference, and the process cannot move forward until the I/O operation is completed, there is obviously only one way, and that is to wait.

The same is waiting. There are smart waiting and silly waiting, referred to as silly waiting. So, should you choose smart waiting or silly waiting?

Suppose you are a short-tempered person (CPU) and need to wait for an important file. Unfortunately, this file can only be delivered by express (I/O), then you choose to do nothing at this time, and look affectionately at the door. Just like Hani who is looking forward to you, waiting for this courier intently? Or don't worry about the express delivery for now, play a game, watch a movie, watch a short video, and wait for the express delivery to come?

Obviously, a better way is to do other things first, and let's talk about it when the express arrives.

Therefore: the key point here is that the things on hand before the express delivery can be paused, switch to other tasks, and switch back when the express delivery arrives.

After understanding this, you can understand what happens at the bottom when performing I/O operations.

Next, let us take reading a disk file as an example to explain this process.

6. What happens at the bottom when I/O is performed

In the last article " Deep into the bottom of the computer, understand threads and thread pools ", we introduced the concept of processes and threads.

In an operating system that supports threads, threads are actually scheduled instead of processes. In order to understand the I/O process more clearly, we temporarily assume that the operating system only has the concept of processes, and we don’t consider threads first. Influence our discussion.

Now there are two processes in memory, process A and process B, and process A is currently running.

As shown below:

There is a section of code to read the file in process A, no matter what language we usually define a buff to load data, and then call functions such as read.

like this:

read(buff);

This is a typical I/O operation. When the CPU executes this code, it will send a read request to the disk.

Note: Compared with the speed at which the CPU executes instructions, I/O operations are very slow. Therefore, it is impossible for the operating system to waste precious CPU computing resources on unnecessary waiting. At this time, the focus is on, pay attention to the next It's the point.

Because the external device performs I/O operations very slowly, the process cannot continue to move forward until the I/O operation is completed. This is the so-called blocking, which is commonly referred to as a block.

After the operating system detects that the process initiates a request to the I/O device, it suspends the operation of the process. How to suspend the operation? Very simple: just record the running status of the current process and point the PC register of the CPU to the instructions of other processes.

When the process is suspended, it will continue to execute. Therefore, the operating system must save the suspended process for subsequent execution. Obviously, we can use the queue to save the suspended process.

As shown in the figure below, process A is suspended for execution and placed in the blocking queue (note: different operating systems have different implementations, and each I/O device may have a corresponding blocking queue, but this implementation detail The difference does not affect our discussion).

At this time, the operating system has sent an I/O request to the disk, so the disk driver starts to copy the data from the disk to the buff of process A. Although the execution of process A has been suspended at this time, this does not prevent the disk from copying data to the memory.

Note: Modern disks do not need the help of the CPU when copying data to the memory. This is the so-called DMA (Direct Memory Access).

This process is shown in the figure below:

Let the disk copy the data first, and we will continue to talk.

In fact: In addition to the blocking queue, the operating system also has a ready queue. The so-called ready queue means that the processes in the queue are ready to be executed by the CPU.

You may ask why not have a ready queue for direct execution? The answer is simple: there are too many porridges. Thousands of processes can be created on a machine with only one core. The CPU cannot execute so many processes at the same time. Therefore, there must be such a process, even if everything is ready. Nor can it be allocated to computing resources, and such processes are placed in the ready queue.

Now process B is in the ready queue, everything is ready and only owes CPU.

As shown below:

When the execution of process A is suspended, the CPU cannot be idle, because there is still process B waiting to be fed in the ready queue. At this time, the operating system starts to find the next executable process in the ready queue, which is process B here.

At this time, the operating system takes the process B out of the ready queue, finds out the location of the machine instruction executed when the process B is suspended, and then points the PC register of the CPU to this location, so that the process B starts to run.

As shown below:

Note: The next paragraph is the key point!

Pay attention to the above picture: At this time, process B is being executed by the CPU, and the disk is copying data to the memory space of process A. Can you see it-everyone is busy, no one is idle, data copy and instruction execution are at the same time Under the scheduling of the operating system, the CPU and disk are fully utilized. This is where the programmer's wisdom lies.

Now you should understand why the operating system is so important.

After that, the disk finally copied all the data to the memory of process A. At this time, the disk notifies the operating system that the task is complete. You may ask how to notify? This is interruption.

After the operating system receives the disk interrupt, it finds that the data copy is complete, and the process A regains the qualification to continue running. At this time, the operating system carefully puts the process A from the blocking queue to the ready queue.

As shown below:

Note: From the previous discussion on the ready state, we know that the operating system will not directly run process A, and process A must be placed in the ready queue to wait, which is fair to everyone.

After that, the process B continues to execute, and the process A continues to wait. After the process B executes for a while, the operating system thinks that the execution time of the process B is long enough, so it puts the process B in the ready queue, takes the process A out and continues execution.

Note: The operating system puts process B in the ready queue, so process B is suspended only because the time slice is up, not because it is blocked by initiating I/O requests.

As shown below:

Process A continues to execute. At this time, the buff is filled with the desired data. Process A runs happily, as if it has never been suspended. The process knows nothing about itself being suspended. This is The magic of the operating system.

Now you should understand what kind of process I/O is.

This method in which a process performs I/O operations is blocked and suspended execution is called blocking I/O, blocking I/O, which is also the most common and easiest to understand I/O method. There are blocking I/O and non- Blocking I/O, we will not consider this method for the time being here.

At the beginning of this section, we said that we only consider processes and not threads for the time being. Now we relax this condition. It is actually very simple. We only need to change the processes scheduled in the previous figure to threads. The discussion here is the same for threads. Established.

7. Zero-copy

The last thing to note is: In the above explanation, we directly copied the disk data to the process space, but in general, the I/O data is first copied to the operating system, and then the operating system is copied to the process space. .

Therefore, we can see that there is actually a layer of copy through the operating system. For scenarios with high performance requirements, it is actually possible to bypass the operating system and directly perform data copy. This is also the scenario described in this article, which bypasses the operating system directly. The technology for data copy is called Zero-copy, which is a technology commonly used in high-concurrency and high-performance scenarios. The principle is very simple.

PS: For Java programmers engaged in instant messaging development, the well-known high-performance network framework Netty uses zero-copy technology. For details, you can read section 12 of the article " NIO Framework Explained: Netty's High-Performance Way ". If you are curious but don’t understand the Netty framework, you can get started with these two articles: " Beginners: The most thorough Netty high-performance principles and framework analysis so far ", "The most popular Netty entry in history: basic Introduction, environment construction, hands-on combat ".

8. Summary of this article

This article explains the I/O commonly used by programmers (including the so-called network I/O). Generally speaking, as programmers, we don’t need to care about it. However, understanding the underlying principles behind I/O is important for designing such high-performance and high-performance as IM. Concurrent systems are extremely useful. I hope this article will help you deepen your understanding of I/O.

The next article "Understanding High Performance and High Concurrency from the Root (3): Going Deep into the Operating System, Thorough Understanding of I/O Multiplexing" will share a breakthrough in I/O technology, precisely because of it, It has completely solved the C10K problem in high-concurrency network communication (see " High-performance network programming (2): The famous C10K concurrent connection problem in the last 10 years "), so stay tuned!

Appendix: More high-performance, high-concurrency articles

" High-performance network programming (1): How many concurrent TCP connections can a single server have "

" High-performance network programming (2): The famous C10K concurrent connection problem in the last 10 years "

" High-Performance Network Programming (3): In the next 10 years, it's time to consider C10M concurrency "

" High-performance network programming (4): Theoretical exploration of high-performance network applications from C10K to C10M "

" High-Performance Network Programming (5): Reading the I/O Model in High-Performance Network Programming in One Article "

" High-Performance Network Programming (6): Understanding the Thread Model in High-Performance Network Programming in One Article "

" High-performance network programming (7): What is high concurrency? Understand in one sentence!

" Take the network access layer design of the online game server as an example to understand the technical challenges of real-time communication "

" Knowing the technology sharing: knowing the practice of high-performance long-connection gateway technology with tens of millions of concurrency "

" Taobao Technology Sharing: The Technological Evolution Road of the Mobile Access Layer Gateway of the Hand Taobao Billion Level "

" A set of mobile IM architecture design practice sharing for massive online users (including detailed graphics and text) "

"An Original Distributed Instant Messaging (IM) System Theoretical Architecture Plan "

" WeChat background based on the time series of massive data cold and hot hierarchical architecture design practice "

" WeChat Technical Director Talks about Architecture: The Way of WeChat-Dao Zhi Jian (Full Speech) "

" How to Interpret "WeChat Technical Director Talking about Architecture: The Way of WeChat-The Road to the Simple" "

" Rapid Fission: Witness the evolution of WeChat's powerful back-end architecture from 0 to 1 (1) "

" 17 Years of Practice: Technical Methodology of Tencent's Massive Products "

" Summary of Tencent's Senior Architect Dry Goods: An article to understand all aspects of large-scale distributed system design "

" Take Weibo application scenarios as an example to summarize the architectural design steps of massive social systems "

" Getting Started: A Zero-Basic Understanding of the Evolution History, Technical Principles, and Best Practices of Large-scale Distributed Architectures "

" From novice to architect, one piece is enough: the evolution of architecture from 100 to 10 million high concurrency "

This article has been simultaneously published on the official account of "Instant Messaging Technology Circle".

▲ The link of this article on the official account is: click here to enter . The synchronous publishing link is: http://www.52im.net/thread-3280-1-1.html

Guess you like

Origin blog.csdn.net/hellojackjiang2011/article/details/111866439