Several ways to prevent concurrency from crashing downstream services

foreword

Previous article I used hibernation for concurrency control, which broke down the downstream services and received responses from many netizens. Some people asked themselves whether the solution they usually used would work, and some suggested to use the congestion control strategy of TCP to dynamically adjust the The number of concurrent, and some people ask why I have to control the downstream resistance.

Today I will summarize several schemes for concurrency control when calling downstream services.

Because our article is a popular science article, the main purpose is to summarize how we should do a good job of concurrency control while enjoying the efficiency improvement brought by concurrency, so that the upstream and downstream of the entire system can be more stable. Which service is added, and who is responsible for the discussion in the event of an accident.

Concurrency Control Scheme

We mentioned earlier that the biggest disadvantage of using hibernation for concurrency control is that it does not consider the feelings of downstream services. goroutineAfter each time a fixed number of tasks are started, the caller sleeps for 1 second, instead of waiting for the feedback from downstream services to start the next batch. task execution.

func badConcurrency() {
 batchSize := 500
 for {
  data, _ := queryDataWithSizeN(batchSize)
  if len(data) == 0 {
   break
  }

  for _, item := range data {
   go func(i int) {
    doSomething(i)
   }(item)
  }

  time.Sleep(time.Second * 1)
 }
}

In addition, there is also the problem of uneven distribution of requests in the upstream. There are no requests at all during hibernation. After the hibernation is over, a new batch of requests is initiated immediately regardless of whether the downstream execution is completed or not.

Therefore, we should do concurrency control from the perspective of waiting for downstream feedback and request distribution as evenly as possible . Of course, in actual projects, the two aspects should be combined.

For the executable sample code of this article, please visit the link below:

https://github.com/kevinyan815/gocookbook/blob/master/codes/prevent_over_concurrency/main.go

Use a current limiter

When we initiate concurrent requests downstream, we can limit the current through the current limiter. If the limit is reached, we will block until the request can be initiated again. As soon as I hear the blockage until blabla , some students are immediately excited and want to use channelit to implement a current limiter. "Cough sound is used here" is actually completely unnecessary. Golang's official current limiter time/ratepackage's Wait method can provide us with this function.

func useRateLimit() {
 limiter := rate.NewLimiter(rate.Every(1*time.Second), 500)
 batchSize := 500
 for {
  data, _ :=queryDataWithSizeN(batchSize)
  if len(data) == 0 {
   fmt.Println("End of all data")
   break
  }

  for _, item := range data {
   // 阻塞直到令牌桶有充足的Token
   err := limiter.Wait(context.Background())
   if err != nil {
    fmt.Println("Error: ", err)
    return
   }
   go func(i int) {
    doSomething(i)
   }(item)
  }
 }
}

// 模拟调用下游服务
func doSomething(i int) {
 time.Sleep(2 * time.Second)
 fmt.Println("End:", i)
}

// 模拟查询N条数据
func queryDataWithSizeN(size int) (dataList []int, err error) {
 rand.Seed(time.Now().Unix())
 dataList = rand.Perm(size)
 return
}

time/rate包提供的限流器采用的是令牌桶算法，使用Wait方法是当桶中没有足够的令牌时调用者会阻塞直到能取到令牌，当然也可以通过Wait方法接受的Context参数设置等待超时时间。限流器往桶中放令牌的速率是恒定的这样比单纯使用time.Sleep请求更均匀些。

关于time/rate 限流器的使用方法的详解，请查看我之前的文章：Golang官方限流器的用法详解

用了限流器了之后，只是让我们的并发请求分布地更均匀了，最好我们能在受到下游反馈完成后再开始下次并发。

使用WaitGroup

我们可以等上批并发请求都执行完后再开始下一批任务，估计大部分同学听到这马上就会想到应该加WaitGroup

WaitGroup适合用于并发-等待的场景：一个goroutine在检查点(Check Point)等待一组执行任务的 worker goroutine 全部完成，如果在执行任务的这些worker goroutine 还没全部完成，等待的 goroutine 就会阻塞在检查点，直到所有woker goroutine 都完成后才能继续执行。

func useWaitGroup() {

 batchSize := 500
 for {
  data, _ := queryDataWithSizeN(batchSize)
  if len(data) == 0 {
   fmt.Println("End of all data")
   break
  }
  var wg sync.WaitGroup
  for _, item := range data {
   wg.Add(1)
   go func(i int) {
    doSomething(i)
    wg.Done()
   }(item)
  }
  wg.Wait()

  fmt.Println("Next bunch of data")
 }
}

这里调用程序会等待这一批任务都执行完后，再开始查下一批数据进行下一批请求，等待时间取决于这一批请求中最晚返回的那个响应用了多少时间。

使用Semaphore

如果你不想等一批全部完成后再开始下一批，也可以采用一个完成后下一个补上的策略，这种比使用WaitGroup做并发控制，如果下游资源够，整个任务的处理时间会更快一些。这种策略需要使用信号量（Semaphore）做并发控制，Go 语言里通过扩展库golang.org/x/sync/semaphore 提供了信号量并发原语。

关于信号量的使用方法和实现原理，可以读读我以前的文章：并发编程-信号量的使用方法和其实现原理

上面的程序改为使用信号量semaphore.Weighted做并发控制的示例如下：

func useSemaphore() {
 var concurrentNum int64 = 10
 var weight int64 = 1
 var batchSize int = 50
 s := semaphore.NewWeighted(concurrentNum)
 for {
  data, _ := queryDataWithSizeN(batchSize)
  if len(data) == 0 {
   fmt.Println("End of all data")
   break
  }

  for _, item := range data {
      s.Acquire(context.Background(), weight)
   go func(i int) {
    doSomething(i)
    s.Release(weight)
   }(item)
  }

 }
}

使用生产者消费者模式

也有不少读者回复说得加线程池才行，因为每个人公司里可能都有在用的线程池实现，直接用就行，我在这里就不再献丑给大家实现线程池了。在我看来我们其实是需要实现一个生产者和消费者模式，让线程池帮助我们限制只有固定数量的消费者线程去做下游服务的调用，而生产者则是将数据存储里取出来。

channel 正好能够作为两者之间的媒介。

func useChannel() {
 batchSize := 50
 dataChan := make(chan int)
 var wg sync.WaitGroup
 wg.Add(batchSize + 1)
 // 生产者
 go func() {
  for {
   data, _ := queryDataWithSizeN(batchSize)
   if len(data) == 0 {
    break
   }
   for _, item := range data {
    dataChan <- item
   }
  }
  close(dataChan)
  wg.Done()
 }()
    // 消费者
 go func() {
  for i := 0; i < 50; i++ {
   go func() {
    for {
     select {
     case v, ok := <- dataChan:
      if !ok {
       wg.Done()
       return
      }
      doSomething(v)
     }
    }
   }()
  }
 }()

 wg.Wait()
}

这个代码实现里，如果用ErrorGroup代替WaitGroup的话还能更简化一些，这个就留给读者自己探索吧。

关于ErrorGroup的用法总结，推荐阅读文章：觉得WaitGroup不好用？试试ErrorGroup吧！

总结

通过文章里总结的一些方法，我们也能看出来并发编程的场景下，除了关注发起的并发线程数外，更重要的是还需要关注被异步调用的下层服务的反馈，不是一味的加并发数就能解决问题的。理解我们为什么在并发编程中要关注下层服务的反馈是很重要的，否则我们列举的那些方案其实都可以在goroutine里再开goroutine，不关心是否执行完成直接返回，无限套娃下去。

欢迎大家关注大佬公众号，学到很多硬核知识

本文分享自微信公众号 - HHFCodeRv（hhfcodearts）。
如有侵权，请联系 [email protected] 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一起分享。