Implementation principle of coroutine

Hello everyone, I am Yi An! Today we will discuss a question, the implementation principle of Go coroutines. This "coroutine" is not that "Ctrip".

Thread Implementation Model

Before talking about coroutines, let's look at the thread model.

There are three main ways to implement threads: 1:1 thread model implemented by one-to-one mapping between lightweight processes and kernel threads, N:1 thread model implemented by user threads and kernel threads, and mixed implementation of user threads and lightweight processes N:M threading model.

1:1 threading model

The kernel thread (Kernel-Level Thread, KLT) I mentioned above is a thread supported by the operating system kernel. The kernel schedules the thread through the scheduler and is responsible for completing the thread switching.

We know that in Linux operating system programming, a child process is often created through the fork() function to represent a thread in a kernel. After a process calls the fork() function, the system will first allocate resources to the new process, for example, space for storing data and code. Then copy all the values of the original process to the new process, only a few values are different from the values of the original process (such as PID), which is equivalent to copying a master process.

Using fork() to create sub-processes to achieve parallel operation will generate a large amount of redundant data, that is, occupy a large amount of memory space, and consume a large amount of CPU time to initialize the memory space and copy data.

If it is the same data, why not share this data of the main process? At this time, the lightweight process (Light Weight Process, LWP) appeared.

Compared with the threads created by the fork() system call, LWP uses the clone() system call to create threads. This function copies the data structure of some resources of the parent process. The copied content is optional, and resources that have not been copied can be Shared to child processes through pointers. Therefore, the operation unit of the lightweight process is smaller and the operation speed is faster. LWPs are mapped one-to-one with kernel threads, and each LWP is supported by a kernel thread.

N:1 threading model

Since the 1:1 threading model is one-to-one mapping with the kernel, there is switching between user mode and kernel mode in thread creation and switching, and the performance overhead is relatively large. In addition, it also has limitations, mainly referring to the limited resources of the system, which cannot support the creation of a large number of LWPs.

The N:1 threading model can solve these two problems of the 1:1 threading model very well.

This thread model completes thread creation, synchronization, destruction, and scheduling in user space, without the help of the kernel, that is to say, no space between user state and kernel state will be generated during thread creation, synchronization, and destruction. Switching, so the operation of the thread is very fast and low-cost.

N:M threading model

The disadvantage of the N:1 thread model is that the operating system cannot perceive user mode threads, so it is easy to cause a certain thread to be blocked when making a system call to the kernel thread, resulting in the entire process being blocked.

The N:M thread model is a hybrid thread management model implemented based on the above two thread models, which supports user-mode threads to connect to kernel threads through LWP. The number of user-mode threads and the number of kernel-mode LWPs are N:M mappings. relation.

After understanding these three thread models, you can clearly understand the difference between Go's coroutine implementation and Java thread implementation.

The implementation of the Thread start method in JDK 1.8 Thread.java is actually implemented by calling the start0 method from Native; under Linux, the implementation of JVM Thread is based on pthread_create, and pthread_create actually calls clone() to complete the system call Create thread.

Therefore, Java currently uses user threads plus lightweight threads under the Linux operating system, and a user thread is mapped to a kernel thread, that is, a 1:1 thread model. Since threads are scheduled by the kernel, switching from one thread to another involves a context switch.

The Go language uses the N:M thread model to implement its own scheduler. It multiplexes (or schedules) M coroutines on N kernel threads. The context switch of the coroutine is performed by the coroutine in the user mode. The scheduler does it, so there is no need to trap into the kernel, which is a small cost in comparison.

Implementation principle of coroutine

Coroutines are not only implemented in the Go language. In fact, most languages currently have their own set of coroutines, including C#, erlang, python, lua, javascript, ruby, etc.

You may be more familiar with processes and threads than coroutines. A process generally represents an application service, and multiple threads can be created in an application service, but the concept of a coroutine is different from a process or a thread. We can regard a coroutine as a class function or a piece of code in a function. We can Easily create multiple coroutines in one main thread.

程序调用协程与调用函数不一样的是，协程可以通过暂停或者阻塞的方式将协程的执行挂起，而其它协程可以继续执行。这里的挂起只是在程序中（用户态）的挂起，同时将代码执行权转让给其它协程使用，待获取执行权的协程执行完成之后，将从挂起点唤醒挂起的协程。协程的挂起和唤醒是通过一个调度器来完成的。

结合下图，你可以更清楚地了解到基于N:M线程模型实现的协程是如何工作的。

假设程序中默认创建两个线程为协程使用，在主线程中创建协程ABCD…，分别存储在就绪队列中，调度器首先会分配一个工作线程A执行协程A，另外一个工作线程B执行协程B，其它创建的协程将会放在队列中进行排队等待。

当协程A调用暂停方法或被阻塞时，协程A会进入到挂起队列，调度器会调用等待队列中的其它协程抢占线程A执行。当协程A被唤醒时，它需要重新进入到就绪队列中，通过调度器抢占线程，如果抢占成功，就继续执行协程A，失败则继续等待抢占线程。

相比线程，协程少了由于同步资源竞争带来的CPU上下文切换，I/O密集型的应用比较适合使用，特别是在网络请求中，有较多的时间在等待后端响应，协程可以保证线程不会阻塞在等待网络响应中，充分利用了多核多线程的能力。而对于CPU密集型的应用，由于在多数情况下CPU都比较繁忙，协程的优势就不是特别明显了。

Kilim协程框架

虽然这么多的语言都实现了协程，但目前Java原生语言暂时还不支持协程。不过你也不用泄气，我们可以通过协程框架在Java中使用协程。

目前Kilim协程框架在Java中应用得比较多，通过这个框架，开发人员就可以低成本地在Java中使用协程了。

在Java中引入 Kilim ，和我们平时引入第三方组件不太一样，除了引入jar包之外，还需要通过Kilim提供的织入（Weaver）工具对Java代码编译生成的字节码进行增强处理，比如，识别哪些方式是可暂停的，对相关的方法添加上下文处理。通常有以下四种方式可以实现这种织入操作：

在编译时使用maven插件；
在运行时调用kilim.tools.Weaver工具；
在运行时使用kilim.tools.Kilim invoking调用Kilim的类文件；
在main函数添加 if (kilim.tools.Kilim.trampoline(false,args)) return。

Kilim框架包含了四个核心组件，分别为：任务载体（Task）、任务上下文（Fiber）、任务调度器（Scheduler）以及通信载体（Mailbox）。

Task对象主要用来执行业务逻辑，我们可以把这个比作多线程的Thread，与Thread类似，Task中也有一个run方法，不过在Task中方法名为execute，我们可以将协程里面要做的业务逻辑操作写在execute方法中。

与Thread实现的线程一样，Task实现的协程也有状态，包括：Ready、Running、Pausing、Paused以及Done总共五种。Task对象被创建后，处于Ready状态，在调用execute()方法后，协程处于Running状态，在运行期间，协程可以被暂停，暂停中的状态为Pausing，暂停后的状态为Paused，暂停后的协程可以被再次唤醒。协程正常结束后的状态为Done。

Fiber对象与Java的线程栈类似，主要用来维护Task的执行堆栈，Fiber是实现N:M线程映射的关键。

Scheduler是Kilim实现协程的核心调度器，Scheduler负责分派Task给指定的工作者线程WorkerThread执行，工作者线程WorkerThread默认初始化个数为机器的CPU个数。

Mailbox对象类似一个邮箱，协程之间可以依靠邮箱来进行通信和数据共享。协程与线程最大的不同就是，线程是通过共享内存来实现数据共享，而协程是使用了通信的方式来实现了数据共享，主要就是为了避免内存共享数据而带来的线程安全问题。

协程与线程的性能比较

接下来，我们通过一个简单的生产者和消费者的案例，来对比下协程和线程的性能。

Java多线程实现源码：

public class MyThread {
    
    
 private static Integer count = 0;//
 private static final Integer FULL = 10; //最大生产数量
 private static String LOCK = "lock"; //资源锁

 public static void main(String[] args) {
    
    
  MyThread test1 = new MyThread();

  long start = System.currentTimeMillis();

  List<Thread> list = new ArrayList<Thread>();
  for (int i = 0; i < 1000; i++) {//创建五个生产者线程
   Thread thread = new Thread(test1.new Producer());
   thread.start();
   list.add(thread);
  }

  for (int i = 0; i < 1000; i++) {//创建五个消费者线程
   Thread thread = new Thread(test1.new Consumer());
   thread.start();
   list.add(thread);
  }

  try {
    
    
   for (Thread thread : list) {
    
    
    thread.join();//等待所有线程执行完
   }
  } catch (InterruptedException e) {
    
    
   e.printStackTrace();
  }

  long end = System.currentTimeMillis();
  System.out.println("子线程执行时长：" + (end - start));
 }
    //生产者
 class Producer implements Runnable {
    
    
  public void run() {
    
    
   for (int i = 0; i < 10; i++) {
    
    
    synchronized (LOCK) {
    
    
     while (count == FULL) {//当数量满了时
      try {
    
    
       LOCK.wait();
      } catch (Exception e) {
    
    
       e.printStackTrace();
      }
     }
     count++;
     System.out.println(Thread.currentThread().getName() + "生产者生产，目前总共有" + count);
     LOCK.notifyAll();
    }
   }
  }
 }
    //消费者
 class Consumer implements Runnable {
    
    
  public void run() {
    
    
   for (int i = 0; i < 10; i++) {
    
    
    synchronized (LOCK) {
    
    
     while (count == 0) {//当数量为零时
      try {
    
    
       LOCK.wait();
      } catch (Exception e) {
    
    
      }
     }
     count--;
     System.out.println(Thread.currentThread().getName() + "消费者消费，目前总共有" + count);
     LOCK.notifyAll();
    }
   }
  }
 }
}

Kilim协程框架实现源码：

public class Coroutine  {
    
    

  static Map<Integer, Mailbox<Integer>> mailMap = new HashMap<Integer, Mailbox<Integer>>();//为每个协程创建一个信箱，由于协程中不能多个消费者共用一个信箱，需要为每个消费者提供一个信箱，这也是协程通过通信来保证共享变量的线程安全的一种方式

 public static void main(String[] args) {
    
    

  if (kilim.tools.Kilim.trampoline(false,args)) return;
  Properties propes = new Properties();
  propes.setProperty("kilim.Scheduler.numThreads", "1");//设置一个线程
  System.setProperties(propes);
  long startTime = System.currentTimeMillis();
  for (int i = 0; i < 1000; i++) {//创建一千生产者
   Mailbox<Integer> mb = new Mailbox<Integer>(1, 10);
   new Producer(i, mb).start();
   mailMap.put(i, mb);
  }

  for (int i = 0; i < 1000; i++) {//创建一千个消费者
   new Consumer(mailMap.get(i)).start();
  }

  Task.idledown();//开始运行

   long endTime = System.currentTimeMillis();

      System.out.println( Thread.currentThread().getName()  + "总计花费时长：" + (endTime- startTime));
 }

}

//生产者
public class Producer extends Task<Object> {
    
    

 Integer count = null;
 Mailbox<Integer> mb = null;

 public Producer(Integer count, Mailbox<Integer> mb) {
    
    
  this.count = count;
  this.mb = mb;
 }

 public void execute() throws Pausable {
    
    
  count = count*10;
  for (int i = 0; i < 10; i++) {
    
    
   mb.put(count);//当空间不足时，阻塞协程线程
   System.out.println(Thread.currentThread().getName() + "生产者生产，目前总共有" + mb.size() + "生产了：" + count);
   count++;
  }
 }
}

//消费者
public class Consumer extends Task<Object> {
    
    

 Mailbox<Integer> mb = null;

 public Consumer(Mailbox<Integer> mb) {
    
    
  this.mb = mb;
 }

 /**
  * 执行
  */
 public void execute() throws Pausable {
    
    
  Integer c = null;
  for (int i = 0; i < 10000; i++)  {
    
    
   c = mb.get();//获取消息，阻塞协程线程

   if (c == null) {
    
    
    System.out.println("计数");
   }else {
    
    
    System.out.println(Thread.currentThread().getName() + "消费者消费，目前总共有" + mb.size() + "消费了：" + c);
    c = null;
   }
  }
 }
}

在这个案例中，我创建了1000个生产者和1000个消费者，每个生产者生产10个产品，1000个消费者同时消费产品。我们可以看到两个例子运行的结果如下：

多线程执行时长：2761

协程执行时长：1050

通过上述性能对比，我们可以发现：在有严重阻塞的场景下，协程的性能更胜一筹。其实，I/O阻塞型场景也就是协程在Java中的主要应用。

总结

协程和线程密切相关，协程可以认为是运行在线程上的代码块，协程提供的挂起操作会使协程暂停执行，而不会导致线程阻塞。

协程又是一种轻量级资源，即使创建了上千个协程，对于系统来说也不是很大的负担，但如果在程序中创建上千个线程，那系统可真就压力山大了。可以说，协程的设计方式极大地提高了线程的使用率。

协程是一种设计思想，不仅仅局限于某一门语言，况且Java已经可以借助协程框架实现协程了。

但不得不告诉你的是，协程还是在Go语言中的应用较为成熟，在Java中的协程目前还不是很稳定，重点是缺乏大型项目的验证，可以说Java的协程设计还有很长的路要走。

本文由 mdnice 多平台发布

Implementation principle of coroutine

Thread Implementation Model

1:1 threading model

N:1 threading model

N:M threading model

Implementation principle of coroutine

Kilim协程框架

协程与线程的性能比较

总结

Guess you like