How does Go achieve a hot restart

Author: zhijiezhang, Tencent PCG background development engineers

Recently, I discovered a hot restart related problem when optimizing the company's framework trpc. After the optimization, I also summarized and settled, and I made a simple review of how to achieve hot restart in go.

1. What is a hot restart?

Hot restart (Hot Restart) is a means to ensure service availability. It allows the established connection to not be interrupted during the service restart, the old service process no longer accepts new connection requests, and the new connection request will be accepted in the new service process. For the connection that has been established in the original service process, you can also set it to read off, and wait for the request on the connection to be smoothly processed and the connection is idle before exiting. In this way, it can be ensured that the established connection is not interrupted, the transaction (request, processing, response) on the connection can be completed normally, and the new service process can also accept the connection and process the request on the connection normally. Of course, the smooth exit of the process during the hot restart involves not only the connected transaction, but also the message service and custom transactions that require attention.

This is a rough description of hot restart as I understand it. Is there a need for a warm restart now? My understanding is to look at the scene.

Take the background development as an example. If the operation and maintenance platform has the ability to automatically kick off the traffic when the service is upgraded or restarted, and automatically add the traffic back when the service is ready, if the service QPS and request processing time can be reasonably estimated, then just configure a reasonable The waiting time before stopping can achieve a similar effect to a hot restart. In this case, it is unnecessary to support hot restart in the background service. However, if we develop a microservice framework, we cannot make such assumptions about the future deployment platform and environment. It is also possible that the user will only deploy on one or two physical machines without other load balancing facilities, but we do not want to be affected by restarts. Interference, a hot restart is necessary. Of course, there are some more complex and demanding scenarios that also require hot restart capabilities.

Hot restart is a more important means to ensure service quality, and it is worth understanding. This is also the original intention of this article.

2. How to achieve a hot restart?

How to achieve a hot restart, in fact, cannot be generalized here, but should be combined with actual scenarios (such as service programming model, availability requirements, etc.). The rough realization idea can be thrown away first.

Generally, to achieve a hot restart, roughly the following steps are required:

  • First, let the old process, called the parent process here, first fork a child process to replace it;

  • Then, after the child process is ready, notify the parent process, accept the new connection request normally, and process the request received on the connection;

  • Then, after the parent process has processed the request on the established connection and the connection is idle, it exits smoothly.

It sounds simple...

2.1. Know fork

Everyone knows the fork()system call, the parent process calling fork will create a copy of the process, and the code can also distinguish whether it is a child process or a parent process by whether the fork return value is 0.

int main(char **argv, int argc) {
    pid_t pid = fork();
    if (pid == 0) {
        printf("i am child process");
    } else {
        printf("i am parent process, i have a child process named %d", pid);
    }
}

Some developers may not know the implementation principle of fork, or why the return value of fork is different in the parent-child process, or how to make the return value of the parent-child process different... Understanding these requires a bit of knowledge accumulation.

2.2. Return value

In a brief summary, ABI defines some specifications for function calls, how to pass parameters, how to return values, etc. Taking x86 as an example, if the return value is contained in the rax register, it is generally returned through the rax register.

What if the rax register bit width cannot accommodate the return value? It is also simple, the compiler will insert some instructions to complete these mysterious operations, the specific instructions are related to the implementation of the language compiler.

  • In the c language, the address of the return value may be passed to rdi or other registers. Inside the called function, the return value is written into the memory area referred to by rdi through multiple instructions;

  • In the c language, it is also possible to use multiple registers rax, rdx... to temporarily store the return result in the called function, and then assign the values ​​of multiple registers to the variable when the function returns;

  • It may also return through stack memory like golang;

2.3.fork return value

The return value of the fork system call is a bit special. In the parent process and the child process, the return value of this function is different. How to do it?

When Lenovo calls fork from the parent process, what does the operating system kernel need to do? Allocating process control blocks, allocating pids, allocating memory space...There must be many things, here pay attention to the hardware context information of the process, these are very important, when the process is selected by the scheduling algorithm for scheduling, it is necessary to restore the hardware context information of.

When Linux forks, it will make certain modifications to the hardware context of the child process. I just let you get the pid after fork to be 0. What should I do? As mentioned in the previous section 2.2, for those small integers, the rax register is more than enough. When fork returns, the pid allocated by the operating system is placed in the rax register.

Then, for the child process, I only need to clear its hardware context rax register to 0 when fork, and then wait for other settings to be all ok, then change its state from uninterruptible waiting state to runnable state, and wait for it to be When the scheduler is scheduling, it will first restore its hardware context information, including PC, rax, etc., so that after fork returns, the median value of rax is 0, and the final value assigned to pid is 0.

Therefore, it is possible to distinguish whether the current process is a parent process or a child process by this way of judging whether "pid is equal to 0".

2.4. Limitations

Many people know that fork can create a copy of a process and continue to execute it, and different branch logic can be executed according to the return value of fork. If the process is multi-threaded, will calling fork in one thread copy the entire process?

Fork can only create a copy of the thread that calls the function. For other running threads in the process, fork will not be processed. This means that for multithreaded programs, it is not feasible to hope to create a complete copy of the process through fork.

As we mentioned earlier, fork is an important part of realizing hot restart. The limitation of fork here restricts the implementation of hot restart under different service programming models. So we say that specific problems are analyzed in detail, and different implementations can actually be used under different programming models.

3. Single process single thread model

The single-process single-threaded model may be considered obsolete by many people and cannot be used in the production environment, really? Stronger than redis, not just single threaded. It is not useless to emphasize that the single-threaded model is not useless, ok, take it back, and now focus on how the single-threaded model of the single process can achieve hot restart.

Single process and single thread, it is easier to realize hot restart:

  • You can create a child process with a fork,

  • The child process can inherit the resources in the parent process, such as opened file descriptors, including listenfd and connfd of the parent process,

  • The parent process can choose to close listenfd, and the subsequent task of accepting the connection will be handed over to the child process to complete.

  • The parent process can even close connfd, allowing the child process to process connection requests, return packets, etc., or process the requests on the established connection by itself;

  • The parent process chooses to exit at an appropriate point in time, and the child process begins to become the pillar.

The core ideas are these, but when it comes to realization, there are many ways:

  • You can choose the fork method to let the child process get the original listenfd, connfd,

  • The parent process can also choose unixdomain socket to send listenfd and connfd to the child process.

Some students may think, can I not pass these fd?

  • For example, when I open the reuseport, the parent process directly processes the request on the established connection connfd and then closes it. The reuseport.Listen in the child process directly creates a new listenfd.

Yes too! But some issues must be considered in advance:

  • Although reuseport allows multiple processes to listen on the same port multiple times, it seems to meet the requirements, but you must know that as long as the euid is the same, you can listen on this port! It is not safe!

  • The implementation of reuseport is related to the platform. If you listen to the same address+port multiple times on the Linux platform, multiple listenfd bottom layers can share the same connection queue, and the kernel can achieve load balancing, but not on the darwin platform!

Of course, the problems mentioned here certainly exist under the multi-threaded model.

4. Single-process multi-threaded model

The aforementioned problems also appear in the multi-threaded model:

  • Fork can only copy calling thread, not whole process!

  • Multiple fd obtained by reuseport at the same address + port listen multiple times, different platforms have different performances, and may not be able to achieve load banlance when accepting connections!

  • In the case of non-reuseport, listen repeatedly will fail!

  • Do not pass fd, directly re-listen through reuseport to get listenfd, it is not safe, different service process instances may listen on the same port, gg!

  • The logic of the parent process exiting smoothly, close listenfd, wait for the end of the request processing on connfd, close connfd, after everything is in order, the parent process exits, and the child process takes the lead!

5. Other threading models

Other threads basically cannot avoid the implementation or combination of the above 3 and 4, and the corresponding problems are similar, so I will not repeat them.

6. Go to achieve a hot restart: trigger timing

Need to choose a timing to trigger a hot restart. When should it be triggered? The operating system provides a signal mechanism that allows the process to make some custom signal processing.

Killing a process usually kill -9sends a SIGKILL signal to the process. This signal is not allowed to be captured, and SIGABORT is also not allowed to capture. This allows the process owner or a high-privileged user to control the life and death of the process and achieve better management effects.

Kill can also be used to send other signals to the process, such as sending SIGUSR1, SIGUSR2, SIGINT, etc. The process can receive these signals and deal with them accordingly. Here you can choose SIGUSR1 or SIGUSR2 to notify the process of hot restart.

go func() {
    ch := make(chan os.Signal, 1)
    signal.Notify(ch, os.SIGUSR2)
    <- ch

    //接下来就可以做热重启相关的逻辑了
    ...
}()

7. How to judge a hot restart

After the go program is restarted, all the runtime state information is new, so how can I tell whether I am a child process, or whether I want to perform hot restart logic? The parent process can set the environment variable when the child process is initialized, such as adding HOT_RESTART=1.

This requires the code to first check whether the environment variable HOT_RESTART is 1 at the appropriate place, if it is true, then execute the hot restart logic, otherwise execute a new startup logic.

8. ForkExec

If the current process wants to execute the hot restart logic after receiving the SIGUSR2 signal, then good, you need to execute syscall.ForkExec(...) to create a child process. Note that go is different from cc++, which relies on multiple threads to schedule the protocol. Cheng is naturally a multi-threaded program, but he did not use the NPTL thread library to create it, but through the clone system call.

As mentioned earlier, if you simply fork, you can only copy the thread that calls the fork function, and you can't do anything with other threads in the process. Therefore, for a natural multi-threaded program like go, you must restart it from the beginning and execute it again. So the function provided by the go standard library is syscall.ForkExec instead of syscall.Fork.

9. Go to achieve hot restart: pass listenfd

There are several ways to pass fd in go. When the parent process forks the child process, pass fd, or pass it through unix domain socket later. It should be noted that what we pass is actually a file description, not a file descriptor.

Attached is a diagram of the relationship between file descriptor, file description, and inode under Unix-like systems:

Fd is allocated from small to large. The fd in the parent process is 10, and it may not be 10 after being passed to the child process. So is the fd passed to the child process predictable? Can be predicted, but not recommended. So I provide two ways to achieve it.

9.1 ForkExec+ProcAttr{Files: []uintptr{}}

To pass a listenfd is very simple, if it is of type net.Listener, then use tcpln := ln.(*net.TCPListener); file, _ := tcpln.File(); fd := file.FD()to get the fd corresponding to the listener's underlying file description.

It should be noted that the fd here is not the initial fd corresponding to the underlying file description, but an fd copied by dup2 (allocated when tcpln.File() is called), so the underlying file description reference count will be +1. If you want to close the listening socket through ln.Close(), sorry, you can't close it. Here you need to execute file.Close() to close the newly created fd, make the corresponding file description reference count -1, and ensure that the reference count is 0 when Close, before it can be closed normally.

Imagine that if we want to achieve a hot restart, we must wait for the received request on the connection to be processed before we can exit the process, but during this period the parent process can no longer receive new connection requests. If the listener cannot be closed normally here, then our goal It cannot be achieved. Therefore, the handling of fd from dup should be more careful here, don't forget.

OK, let’s talk about syscall.ProcAttr{Files: []uintptr{}}, here is the fd in the parent process to be passed, for example, to pass stdin, stdout, stderr to the child process, you need to transfer these corresponding Insert fd into os.Stdin.FD(), os.Stdout.FD(), os.Stderr.FD(). If you want to pass the listenfd just now, you need to insert the file.FD()returned fd above .

After receiving these fd in the child process, the Unix-like system will generally allocate fd in increasing order from 0, 1, 2, 3, then the passed fd is predictable, if except for stdin, stdout, stderr Pass two more listenfd, then it can be predicted that the fd of these two should be 3, 4. This is generally done in a Unix-like system. The child process can start counting from 3 according to the number of fd passed (for example, passed to the child process through environment variables FD_NUM=2). Oh, these two fd should be 3, 4.

The parent and child processes can organize the sequence of the listenfd passed through an agreed order to facilitate the processing of the child process according to the same agreement. Of course, you can also use fd to rebuild the listener to determine the corresponding listener network+address to distinguish the listener Which logical service corresponds to. It's all possible!

It should be noted that the fd returned by file.FD() is non-blocking, which will affect the underlying file description. Before rebuilding the listener, first set it to nonblock, syscall.SetNonBlock(fd), and then file, _ := os.NewFile(fd); tcplistener := net.FileListener(file), or yes udpconn := net.PacketConn(file), you can get it The listening addresses of tcplistener and udpconn are associated with their corresponding logical services.

As mentioned earlier, file.FD() will set the underlying file description to blocking mode. I will add here that net.FileListener(f), net.PacketConn(f) will call newFileFd()->dupSocket() internally. Several functions internally reset the file description corresponding to fd to non-blocking. The file description corresponding to the listener is shared in the parent and child processes, so there is no need to display it as non-blocking.

Some microservice frameworks support logical service grouping of services. The Google PB specification also supports multiple service definitions. This is also supported in Tencent's goneat and trpc frameworks.

Of course, here I will not write a complete demo that contains all the above descriptions for everyone. This is a bit of space. Here is only a condensed version of the example. Other readers can code and test by themselves if they are interested. It’s important to know that it’s too shallow on paper, so you still have to practice more.

package main

import (
 "fmt"
 "io/ioutil"
 "log"
 "net"
 "os"
 "strconv"
 "sync"
 "syscall"
 "time"
)

const envRestart = "RESTART"
const envListenFD = "LISTENFD"

func main() {

 v := os.Getenv(envRestart)

 if v != "1" {

  ln, err := net.Listen("tcp", "localhost:8888")
  if err != nil {
   panic(err)
  }

  wg := sync.WaitGroup{}
  wg.Add(1)
  go func() {
   defer wg.Done()
   for {
    ln.Accept()
   }
  }()

  tcpln := ln.(*net.TCPListener)
  f, err := tcpln.File()
  if err != nil {
   panic(err)
  }

  os.Setenv(envRestart, "1")
  os.Setenv(envListenFD, fmt.Sprintf("%d", f.Fd()))

  _, err = syscall.ForkExec(os.Args[0], os.Args, &syscall.ProcAttr{
   Env:   os.Environ(),
   Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd(), f.Fd()},
   Sys:   nil,
  })
  if err != nil {
   panic(err)
  }
  log.Print("parent pid:", os.Getpid(), ", pass fd:", f.Fd())
  f.Close()
  wg.Wait()

 } else {

  v := os.Getenv(envListenFD)
  fd, err := strconv.ParseInt(v, 10, 64)
  if err != nil {
   panic(err)
  }
  log.Print("child pid:", os.Getpid(), ", recv fd:", fd)

  // case1: 理解上面提及的file descriptor、file description的关系
  // 这里子进程继承了父进程中传递过来的一些fd,但是fd数值与父进程中可能是不同的
  // 取消注释来测试...
  //ff := os.NewFile(uintptr(fd), "")
  //if ff != nil {
  // _, err := ff.Stat()
  // if err != nil {
  //  log.Println(err)
  // }
  //}

  // case2: 假定父进程中共享了fd 0\1\2\listenfd给子进程,那再子进程中可以预测到listenfd=3
  ff := os.NewFile(uintptr(3), "")
  fmt.Println("fd:", ff.Fd())
  if ff != nil {
   _, err := ff.Stat()
   if err != nil {
    panic(err)
   }

   // 这里pause, 运行命令lsof -P -p $pid,检查下有没有listenfd传过来,除了0,1,2,应该有看到3
   // ctrl+d to continue
   ioutil.ReadAll(os.Stdin)

   fmt.Println("....")
   _, err = net.FileListener(ff)
   if err != nil {
    panic(err)
   }

   // 这里pause, 运行命令lsof -P -p $pid, 会发现有两个listenfd,
   // 因为前面调用了ff.FD() dup2了一个,如果这里不显示关闭,listener将无法关闭
   ff.Close()

   time.Sleep(time.Minute)
  }

  time.Sleep(time.Minute)
 }
}

Here is a simple code that roughly explains how to use ProcAttr to pass listenfd. Here is a question. What if the fd passed in the subsequent parent process is modified, for example, the fd of stdin, stdout, and stderr is not passed? Does the server start to predict that it should start numbering from 0? We can notify the child process through environment variables, for example, from which number the passed fd is listenfd, there are several listenfd, so this is also achievable.

This implementation can be cross-platform.

If you are interested, you can look at this implementation grace provided by facebook .

9.2 unix domain socket + cmsg

Another way of thinking is to pass it through unix domain socket + cmsg. When the parent process starts, it still uses ForkExec to create the child process, but does not pass the listenfd through ProcAttr.

Before creating the child process, the parent process creates a unix domain socket and listens. After the child process is started, a connection to this unix domain socket is established. The parent process starts to send listenfd to the child process through cmsg, and the way to obtain fd is the same In the same way as 9.1, the fd shutdown problem should be handled in the same way.

The child process connects to the unix domain socket and starts to receive cmsg. When the kernel helps the child process to receive messages, it finds that there is a parent process's fd. The kernel finds the corresponding file description and assigns an fd to the child process to establish a mapping between the two relationship. Then when returning to the child process, the child process gets the fd corresponding to the file description. You can get file through os.NewFile(fd), and then you can get tcplistener or udpconn through net.FileListener or net.PacketConn.

The rest of the actions of obtaining the listening address and associating the logical service are the same as described in the 9.1 summary.

Here I also provide a runnable simplified version of the demo for everyone to understand and test.

package main

import (
 "fmt"
 "io/ioutil"
 "log"
 "net"
 "os"
 "strconv"
 "sync"
 "syscall"
 "time"

 passfd "github.com/ftrvxmtrx/fd"
)

const envRestart = "RESTART"
const envListenFD = "LISTENFD"
const unixsockname = "/tmp/xxxxxxxxxxxxxxxxx.sock"

func main() {

 v := os.Getenv(envRestart)

 if v != "1" {

  ln, err := net.Listen("tcp", "localhost:8888")
  if err != nil {
   panic(err)
  }

  wg := sync.WaitGroup{}
  wg.Add(1)
  go func() {
   defer wg.Done()
   for {
    ln.Accept()
   }
  }()

  tcpln := ln.(*net.TCPListener)
  f, err := tcpln.File()
  if err != nil {
   panic(err)
  }

  os.Setenv(envRestart, "1")
  os.Setenv(envListenFD, fmt.Sprintf("%d", f.Fd()))

  _, err = syscall.ForkExec(os.Args[0], os.Args, &syscall.ProcAttr{
   Env:   os.Environ(),
   Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd(), /*f.Fd()*/}, // comment this when test unixsock
   Sys:   nil,
  })
  if err != nil {
   panic(err)
  }
  log.Print("parent pid:", os.Getpid(), ", pass fd:", f.Fd())

  os.Remove(unixsockname)
  unix, err := net.Listen("unix", unixsockname)
  if err != nil {
   panic(err)
  }
  unixconn, err := unix.Accept()
  if err != nil {
   panic(err)
  }
  err = passfd.Put(unixconn.(*net.UnixConn), f)
  if err != nil {
   panic(err)
  }

  f.Close()
  wg.Wait()

 } else {

  v := os.Getenv(envListenFD)
  fd, err := strconv.ParseInt(v, 10, 64)
  if err != nil {
   panic(err)
  }
  log.Print("child pid:", os.Getpid(), ", recv fd:", fd)

  // case1: 有同学认为以通过环境变量传fd,通过环境变量肯定是不行的,fd根本不对应子进程中的fd
  //ff := os.NewFile(uintptr(fd), "")
  //if ff != nil {
  // _, err := ff.Stat()
  // if err != nil {
  //  log.Println(err)
  // }
  //}

  // case2: 如果只有一个listenfd的情况下,那如果fork子进程时保证只传0\1\2\listenfd,那子进程中listenfd一定是3
  //ff := os.NewFile(uintptr(3), "")
  //if ff != nil {
  // _, err := ff.Stat()
  // if err != nil {
  //  panic(err)
  // }
  // // pause, ctrl+d to continue
  // ioutil.ReadAll(os.Stdin)
  // fmt.Println("....")
  // _, err = net.FileListener(ff) //会dup一个fd出来,有多个listener
  // if err != nil {
  //  panic(err)
  // }
  // // lsof -P -p $pid, 会发现有两个listenfd
  // time.Sleep(time.Minute)
  //}
  // 这里我们暂停下,方便运行系统命令来查看进程当前的一些状态
  // run: lsof -P -p $pid,检查下listenfd情况

  ioutil.ReadAll(os.Stdin)
  fmt.Println(".....")

  unixconn, err := net.Dial("unix", unixsockname)
  if err != nil {
   panic(err)
  }

  files, err := passfd.Get(unixconn.(*net.UnixConn), 1, nil)
  if err != nil {
   panic(err)
  }

  // 这里再运行命令:lsof -P -p $pid再检查下listenfd情况

  f := files[0]
  f.Stat()

  time.Sleep(time.Minute)
 }
}

This implementation is limited to unix-like systems.

If there is a mixed service situation, you need to consider the file name of the unix domain socket used to avoid problems caused by the same name. You can consider using "process name.pid" as the name of the unix domain socket, and use environment variables to change It is passed to the child process.

10. Go to achieve hot restart: how to rebuild listener through listenfd

As mentioned earlier, when I get fd, I still don’t know if it corresponds to the tcp listener or udpconn. What should I do? Try it all.

file, err := os.NewFile(fd)
// check error

tcpln, err := net.FileListener(file)
// check error

udpconn, err := net.PacketConn(file)
// check error

11. Go to achieve hot restart: the parent process exits smoothly

How does the parent process exit smoothly? This depends on the logic in the parent process to stop smoothly.

11.1. Processing the request on the established connection

You can start from these two aspects:

  • Shutdown read, no longer accept new requests, the peer will perceive failure when it continues to write data;

  • Continue to process the requests that have been received normally on the connection. After the processing is completed, return the packet and close the connection;

It can also be considered that instead of closing the reader, wait until the connection is idle for a period of time before closing. Whether it is closed as soon as possible is more in line with the requirements, it should be combined with the scene and requirements.

If the availability requirements are more stringent, you may also need to consider passing the read and written buffer data on connfd and connfd to the child process for processing.

11.2. Message Service

  • Confirm whether the message consumption and confirmation mechanism of your service is reasonable

  • No more new messages

  • Exit after processing the received message

11.3. Customize AtExit cleanup task

Some tasks will have some custom tasks. We hope that the process can be executed before exiting. This can provide a registration function similar to AtExit, allowing the process to execute business-defined cleanup logic before exiting.

Whether it is a smooth restart or other normal exits, there is a certain demand for this support.

12. Other

In some scenarios, it is also desirable to transfer connfd, including the corresponding read and write data on connfd.

For example, in the scenario of connection reuse, the client may send multiple requests through the same connection. If the server performs a hot restart operation at some point in the middle, if the server directly connects to read and closes, the subsequent data transmission of the client will fail. If the connection is closed at the end, the previously received request may not respond normally. In this case, you can consider the server to continue processing the connection request, and then close the connection when it is idle. Will it always be idle? possible.

In fact, the server cannot predict whether the client will adopt the connection multiplexing mode. It would be better to choose a more reliable processing method. If the scene requirements are more demanding, and you do not want to solve it through retrying by the upper layer. This can be considered to pass connfd and the buffer data read and written on connfd to the child process, and hand it over to the child process for processing. At this time, there are more points to pay attention to, and the processing is more complicated. If you are interested, you can refer to the implementation of mosn .

13. Summary

As a way to ensure the smooth restart and upgrade of services, hot restart is still very valuable today. This article describes some general ideas for implementing a hot restart, and describes how to implement it in the go service step by step through a demo. Although I haven't provided a complete hot restart example for everyone, I believe you should be able to implement it by yourself after reading it.

Due to the limited level of the author, it is inevitable that there will be omissions in the description. Please correct me.

Reference article

  1. Unix Advanced Programming: Interprocess Communication, Steven Richards

  2. mosn startup process: https://mosn.io/blog/code/mosn-startup/

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/108505187