Understanding high performance and high concurrency from the root (5): Go deep into the operating system and understand the coroutines in high concurrency

The original title of this article "How should programmers understand coroutines in high concurrency", please contact the author for reprinting.

1. Introduction to the series

1.1 Purpose of the article

As a developer of instant messaging technology, the technical concepts related to high performance and high concurrency have long been clear. What thread pool, zero-copy, multiplexing, event-driven, epoll, etc. are all at your fingertips, maybe you Familiar with technical frameworks with these technical characteristics such as: Java's Netty , Php's workman , Go's gnet, etc. But when it comes to face-to-face or technical practice, when you encounter unresolved doubts, you know that what you have is just the skin.

Return to the basics and return to the essence, what are the underlying principles behind these technical features? How to understand the principles behind these technologies in an easy-to-understand and effortless way is exactly what the series of articles "Understanding High Performance and High Concurrency from the Root" will share.

1.2 Origin of the article

I have compiled a lot of resources and articles related to IM, message push and other instant messaging technologies, from the initial open source IM framework MobileIMSDK to the online version of the classic network programming masterpiece " TCP/IP Detailed Explanation ", and then to the IM development programmatic. The article "One entry is enough for beginners: Develop mobile IM from scratch ", and " Introduction to Network Programming for Lazy People ", " Introduction to Brain Disabled Network Programming ", " High-performance Network Programming ", " Not for Known Network Programming " series of articles.

The more you go to the depths of knowledge, the more you feel that you know too little about instant messaging technology. So later, in order to allow developers to better understand the characteristics of networks (especially mobile networks) from the perspective of basic telecommunications technology, I collected and compiled a series of high-level articles on the " Introduction to Zero-Basic Communication Technology for IM Developers " across disciplines . This series of articles is already the knowledge boundary of network communication technology for ordinary instant messaging developers. With these network programming materials before, it is basically enough to solve the knowledge blind spots in network communication.

For the development of instant messaging systems such as IM, knowledge of network communication is indeed very important, but it returns to the essence of technology to realize these technical characteristics of network communication itself: including the thread pool, zero copy, multiplexing, and multiplexing mentioned above. Event-driven, etc., what is their nature? What is the underlying principle? This is the purpose of organizing this series of articles, I hope it will be useful to you.

1.3 Article directory

" Understanding high performance and high concurrency from the root (1): Going deep into the bottom of the computer, understanding threads and thread pools "

" Understanding high performance and high concurrency from the root (2): In-depth operating system, understanding I/O and zero copy technology "

" Understanding high performance and high concurrency from the root (3): In-depth operating system, thorough understanding of I/O multiplexing "

" Understanding high performance and high concurrency from the root (4): In-depth operating system, thorough understanding of synchronization and asynchrony "

" Understanding High Performance and High Concurrency from the Root (5): In-depth operating system and understanding of coroutines in high concurrency " (* This article)

"Understanding high performance and high concurrency from the root (6): How is high concurrency and high performance server realized (to be released later..)"

1.4 Overview of this article

Following the previous article "In- depth operating system, a thorough understanding of synchronization and asynchrony ", this article is the fifth article in the high-performance and high-concurrency series.

Coroutine is an indispensable technology in high-performance and high-concurrency programming. It is widely used in Internet product applications including instant messaging (IM system). For example, the background framework that claims to support the massive number of WeChat users is based on coroutine (detailed See " Open source libco library: the cornerstone of the backend framework that supports tens of millions of connections on a single machine and supports 800 million WeChat users ") And more and more modern programming languages ​​regard coroutines as the most important language technical feature, known ones include: Go , Python , Kotlin, etc.

Therefore, understanding and mastering the coroutine technology is quite necessary for many programmers (especially the back-end programmers of massive network communication applications). This article is written for you to confuse the principle of coroutine technology.

This article has been simultaneously published on the "Instant Messaging Technology Circle" official account , welcome to follow. The link on the official account is: click here to enter .

2. The author of this article

At the request of the author, no real names or personal photos are provided.

The main technical direction of the author of this article is the Internet backend, high-concurrency and high-performance server, and search engine technology. The screen name is "Coder's Deserted Island Survival" and the public account "Coder 's Deserted Island Survive ". Thank the author for his selfless sharing.

3. Introduction

As a programmer, you must have heard the word coroutine more or less. In recent years, this technology has appeared more and more in the vision of programmers, especially in the field of high performance and high concurrency. When your classmates and colleagues mention coroutines, if your brain is blank, you have no idea about it. . .

Then this article is tailored for you.

Not much to say, today's topic is how to thoroughly understand coroutines as a programmer.

4. Ordinary functions

Let's first look at an ordinary function, this function is very simple:

def func():

   print("a")

   print("b")

   print("c")

This is a simple ordinary function, what happens when we call this function?

  • 1) Call func;
  • 2) Func starts to execute until return;
  • 3) After the execution of func is complete, return to function A.

Is it very simple, the function func executes until it returns, and prints out:

a

b

c

So easy, is there, is there!

well!

Note that this code is written in python, but this discussion about coroutines can be applied to any language, because coroutines are not specific to a certain language. And we just happened to use python as an example, because it is simple enough.

So what is a coroutine?

5. From ordinary functions to coroutines

Next, we are about to transition from ordinary functions to coroutines. Unlike ordinary functions that have only one return point, a coroutine can have multiple return points.

What does it mean?

void func() {

  print("a")

  Pause and return

  print("b")

  Pause and return

  print("c")

}

In ordinary functions, the function will return only after the sentence print("c") is executed, but in the coroutine, when print("a") is executed, func will be "paused and returned" because of this code Return to the calling function.

Some classmates may look dumbfounded. Is there anything magical about this?

I can also return by writing a return, like this:

void func() {

  print("a")

  return

  print("b")

  Pause and return

  print("c")

}

It is true that you can write a return statement directly, but the code after the return will not be executed if you write it in this way.

The reason why the coroutine is magical is that when we return from the coroutine, we can continue to call the coroutine, and continue to execute from the last return point of the coroutine.

Just like Monkey King said "fix", the function was suspended:

void func() {

  print("a")

  set

  print("b")

  set

  print("c")

}

At this time, we can return to the calling function. When the calling function remembers the coroutine, the coroutine can be called again, and the coroutine will continue to execute from the last return point.

Amazing, is there any, concentrate, don’t overturn the car.

It's just that Sun Dasheng's formula "ding" is generally called yield in programming languages ​​(there may be different implementations in other languages, but the essence is the same).

It should be noted that when the ordinary function returns, no information about the function runtime will be saved in the address space of the process. After the coroutine returns, the runtime information of the function needs to be saved.

Next, let's take a look at the coroutine with the actual code.

6、“Talk is cheap,show me the code”

Below we use a real example to explain, the language uses python, unfamiliar students don't worry, there will be no barriers to understanding here.

In the python language, the word "定" also uses the keyword yield.

So our func function becomes:

void func() {

  print("a")

  yield

  print("b")

  yield

  print("c")

}

Note: At this time, our func is no longer a simple function, but upgraded to a coroutine, then how should we use it?

It's simple:

def A():

  co =func() # get the coroutine

  next(co) # call coroutine

  print("in function A") # do something

  next(co) # call the coroutine again

We see that although the func function does not have a return statement, that is to say, although it does not return any value, we can still write code like co = func(), which means that co is the coroutine we got.

Next, we call the coroutine, use next(co), and run function A to see what is the result of executing to line 3:

a

Obviously, as we expected, the coroutine func pauses after print("a") due to the execution of yield and returns to function A.

Next is line 4. There is no doubt that the A function is doing something of its own, so it will print:

a

in function A

Next is the key line. What should be printed when the coroutine is called again when the fifth line is executed?

If func is a normal function, then the first line of code of func will be executed, which is to print a.

But func is not a normal function, but a coroutine. As we said before, the coroutine will continue to run at the last return point, so what should be executed here is the code after the first yield of the func function, that is,  print("b" ) .

a

in function A

b

As you can see, the coroutine is a very magical function. It will remember the previous execution state by itself, and will continue to execute from the last return point when it is called again.

7. Graphical explanation

In order to give you a more thorough understanding of the coroutine, we use a graphical way to look at it again.

The first is the ordinary function call:

In the figure: the box indicates the instruction sequence of the function. If the function does not call any other functions, it should be executed from top to bottom, but other functions can be called in the function, so its execution is not simply from the top Down, the arrow line indicates the direction of execution flow.

From the above figure, we can see: we first came to the funcA function, and after a period of execution, we found that another function funcB was called. At this time, control transferred to this function, and after the execution was completed, it returned to the call point of the main function to continue execution. This is a normal function call.

Next is the coroutine:

Here: we still execute it in the funcA function first, call the coroutine after running for a period of time, the coroutine starts to execute until the first hanging point, and then return to the funcA function just like a normal function, and the funcA function executes some code and then calls again The coroutine.

Note: At this time, the coroutine is different from the ordinary function. The coroutine is not executed from the first instruction but from the last suspension point. After a period of execution, it encounters the second suspension point. The coroutine returns to the funcA function again like a normal function, and the whole program ends after the funcA function is executed for a period of time.

8. Function is just a special case of coroutine

How is it, magic is not magic. Unlike ordinary functions, coroutines can know where they were executed last time.

Now you should understand that the coroutine will save the running state of the function when the function is suspended, and can resume from the saved state and continue to run.

Is there a familiar taste? Isn’t this the operating system’s scheduling of threads (see " Understanding the Bottom of the Computer, Understanding Threads and Thread Pools "). Threads can also be suspended. The operating system saves the running state of the thread and then schedules other threads. After that, the thread can continue to run when it is allocated CPU again, as if it had not been suspended.

It's just that the thread scheduling is implemented by the operating system, which is not visible to the programmer, while the coroutine is implemented in the user mode and is visible to the programmer.

This is why some people say that coroutines can be understood as user-mode threads.

There should be applause here.

In other words, programmers can now play the role of the operating system. You can control when the coroutine runs and when it is suspended, which means that the scheduling power of the coroutine is in your hands.

In the case of coroutines, you have the final say in scheduling.

When you write yield in a coroutine, you want to suspend the coroutine, and when you use  next()  , you want to run the coroutine again.

Now you should understand why a function is just a special case of a coroutine. A function is actually just a coroutine without a starting point.

9. The history of coroutines

Some students may think that coroutine is a relatively new technology, but in fact, the concept of coroutine has been proposed as early as 1958. You must know that at this time the concept of thread has not yet been proposed.

In 1972, a programming language finally realized this concept. The two programming languages ​​were Simula 67 and Scheme.

However, the concept of coroutines has never been popular, and even in 1993, someone dedicated to writing papers to dig out the ancient technology of coroutines, like archeology.

Because there are no threads in this period, if you want to write concurrent programs in the operating system, you will have to use technologies like coroutines. Later threads began to appear, and the operating system finally began to natively support concurrent execution of programs, and that’s it. The coroutine gradually faded out of the programmer's sight.

Until recent years, with the development of the Internet, especially the advent of the mobile Internet era, server-side requirements for high concurrency have become higher and higher, and coroutines have once again returned to the mainstream of technology, and major programming languages ​​have been supported or planned to start Support coroutines.

So how is the coroutine implemented?

10. How is the coroutine realized?

Let us start from the nature of the problem to think about this question: What is the nature of the coroutine?

In fact, it is a function that can be suspended and resumed. So what does it mean that it can be suspended and can be resumed?

Students who have watched a basketball game must all know (you can also know if you haven’t watched it), basketball games can also be suspended at any time. During the suspension, everyone needs to remember which side the ball is on and what their respective positions are, and wait until the game continues. When everyone returned to their positions, the referee's whistle sounded and the game continued, as if the game had not been suspended.

Do you see the key to the problem: The game can be suspended or continued because the game state is recorded (position, which side the ball is on), and the state here is the context often referred to in computer science.

Back to the coroutine.

The reason why the coroutine can be suspended or continued, must record the state when it was suspended, that is, the context, and restore its context (state) when it continues to run. In addition: all state information when the function is running is located in the function In the runtime stack.

The stack of the function runtime is the state we need to save, which is the so-called context.

as the picture shows:

From the above figure, we can see that there is only one thread in the process, and there are four stack frames in the stack area. The main function calls the A function, the A function calls the B function, and the B function calls the C function. When the C function is running, the entire The status of the process is as shown in the figure.

Now: We already know that the runtime state of the function is stored in the stack frame in the stack area, and the next point is here.

Since the runtime state of the function is stored in the stack frame in the stack area, if we want to suspend the operation of the coroutine, we must save the data of the entire stack frame, so where should we save the data in the entire stack frame?

Think about this question: Which part of the memory area of ​​the entire process is dedicated to storing data for a long time (process life cycle)? Is the brain blank again?

Don't be blank!

Obviously: this is the heap area. We can store stack frames in the heap area. So how do we store data in the heap area? I hope you are not dizzy yet, opening up space in the heap area is the malloc in our commonly used C language or new in C++.

What we need to do is: apply for a section of space in the heap area, save the entire stack area of ​​the coroutine, and copy it from the heap area to restore the function runtime state when it needs to restore the running of the coroutine.

Think about it again, why do we have to copy data back and forth in such a hassle?

In fact: What we need to do is to directly open up the stack frame space needed for the operation of the coroutine in the heap area, so that it will not be used to return the copy data, as shown in the following figure.

From the above figure, we can see that two coroutines are opened in the program, and the stack area of ​​these two coroutines is allocated on the heap, so that we can interrupt or resume the execution of the coroutine at any time.

Some students may ask, what is the current role of the stack area at the top of the process address space?

The answer is: this area is still used to save the function stack frame, but these functions are not running in the coroutine but in the ordinary thread.

You should see it by now. There are actually 3 execution flows in the above figure:

  • 1) A normal thread;
  • 2) Two coroutines.

Although there are 3 execution streams, how many threads have we created?

The answer is: a thread.

Now you should understand why you need to use coroutines: in theory, we can open countless concurrent execution streams using coroutines, as long as the heap space is sufficient, and there is no overhead of creating threads. All the scheduling and switching of coroutines happen to users. This is why coroutines are also called user-mode threads.

Where is the applause?

Therefore: even if you create N multiple coroutines, there is still only one thread in the operating system, which means that the coroutine is invisible to the operating system.

This may be the reason why the concept of coroutines was proposed earlier than threads. It may be that programmers who write ordinary applications first encountered the need for multiple parallel streams than programmers who wrote operating systems, which may not have been there yet. The concept of the operating system, or the operating system does not have such a requirement for parallelism, so non-operating system programmers can only implement the execution flow by themselves, that is, the coroutine.

Now you should have a clear understanding of coroutines.

11. Summary of coroutine technology concepts

The content of the main text uses a lot of ridicule, and the purpose is to help you understand the concept of coroutine technology easily and wittily. So, let's summarize from serious professional knowledge, what is a coroutine?

11.1 A coroutine is a smaller unit of execution than a thread

A coroutine is an execution unit smaller than a thread, and you can think of it as a lightweight thread.

The reason is light: one of the reasons is that the stack held by the coroutine is much smaller than the thread. In Java, about 1M of stack space is allocated for each thread, while the coroutine may only have tens or hundreds of K. The stack is mainly used to store information such as function parameters, local variables, and return addresses.

We know that thread scheduling is carried out in the operating system, while coroutine scheduling is carried out in user space, which is done by developers by calling the execution context-related APIs at the bottom of the system. Some languages, such as nodejs and go, support coroutines at the language level, while some languages, such as C, require third-party libraries to have coroutine capabilities (for example, WeChat open source Libco library is like this, see: " Open source libco Library: The cornerstone of the back-end framework that supports tens of millions of connections on a single machine and supports 800 million WeChat users ).

Since the thread is the smallest execution unit of the operating system, it can also be concluded that the coroutine is implemented based on the thread, and the creation, switching, and destruction of the coroutine are all performed in a certain thread.

Coroutines are used because the cost of thread switching is relatively high, and coroutines have advantages in this regard.

11.2 Why is the switch of the coroutine so cheap?

Regarding this issue, we review the process of thread switching:

  • 1) When the thread is switching, it needs to store the information of the register in the CPU, and then read the data of another thread, which will take some time;
  • 2) The data in the CPU cache may also become invalid and need to be reloaded;
  • 3) Thread switching involves switching from user mode to kernel mode. It is said that each mode switching requires the execution of thousands of instructions, which is time-consuming.

In fact, the reason why the switch of the coroutine is fast I think is mainly:

  • 1) When switching, the amount of data that the register needs to save and load is relatively small;
  • 2) The cache can be used effectively;
  • 3) There is no switching operation from user mode to kernel mode;
  • 4) More efficient scheduling, because the coroutine is non-preemptive, the previous coroutine is executed or blocked before the CPU will be released, and the thread generally uses the algorithm of time slices, and will perform many unnecessary switching ( In order to make the user not perceive a certain thread card as much as possible).

12. Write at the end

At this point, I believe you already understand what coroutines are all about. For more systematic knowledge of coroutines, you can consult relevant information on your own, so I won’t be wordy anymore.

Next article "Understanding high performance and high concurrency from the root (6): How does high concurrency and high performance server achieve", stay tuned!

Appendix: More high-performance, high-concurrency articles

" High-performance network programming (1): How many concurrent TCP connections can a single server have "

" High-performance network programming (2): The famous C10K concurrent connection problem in the last 10 years "

" High-Performance Network Programming (3): In the next 10 years, it's time to consider C10M concurrency "

" High-performance network programming (4): Theoretical exploration of high-performance network applications from C10K to C10M "

" High-Performance Network Programming (5): Reading the I/O Model in High-Performance Network Programming in One Article "

" High-Performance Network Programming (6): Understanding the Thread Model in High-Performance Network Programming in One Article "

" High-performance network programming (7): What is high concurrency? Understand in one sentence!

" Take the network access layer design of the online game server as an example to understand the technical challenges of real-time communication "

" Knowing the technology sharing: knowing the practice of high-performance long-connection gateway technology with tens of millions of concurrency "

" Taobao Technology Sharing: The Technological Evolution Road of the Mobile Access Layer Gateway of the Hand Taobao Billion Level "

" A set of mobile IM architecture design practice sharing for massive online users (including detailed graphics and text) "

"An Original Distributed Instant Messaging (IM) System Theoretical Architecture Plan "

" WeChat background based on the time series of massive data cold and hot hierarchical architecture design practice "

" WeChat Technical Director Talks about Architecture: The Way of WeChat-Dao Zhi Jian (Full Speech) "

" How to Interpret "WeChat Technical Director Talking about Architecture: The Way of WeChat-The Road to the Simple" "

" Rapid Fission: Witness the evolution of WeChat's powerful back-end architecture from 0 to 1 (1) "

" 17 Years of Practice: Technical Methodology of Tencent's Massive Products "

" Summary of Tencent's Senior Architect Dry Goods: An article to understand all aspects of large-scale distributed system design "

" Take Weibo application scenarios as an example to summarize the architectural design steps of massive social systems "

" Getting Started: A Zero-Basic Understanding of the Evolution History, Technical Principles, and Best Practices of Large-scale Distributed Architectures "

" From novice to architect, one piece is enough: the evolution of architecture from 100 to 10 million high concurrency "

This article has been simultaneously published on the official account of "Instant Messaging Technology Circle".

▲ The link of this article on the official account is: click here to enter . The synchronous publishing link is: http://www.52im.net/thread-3306-1-1.html

Guess you like

Origin blog.csdn.net/hellojackjiang2011/article/details/112781047
Recommended