Avalanche hystrix-go source code analysis tool

Reading the source code of the process, like in the martial arts, like reading, like martial arts Cheats, expert analysis of one by one, to extract the essence, to enhance their own internal forces.
Previous post said about the avalanche effect and common solution micro services , too much water, not on how the code called solutions. githubThere are a lot of open source libraries to solve 雪崩问题, the more famous is Netflixthe open source library hystrix . Collection 流量控制, 熔断, 容错is equal to a javalanguage library. Today analysis of the source library is hystrix-Go , he is hystrix of the golanguage version, it should be said that a simplified version, with very little amount of code to achieve the main function. It recommended friends have time to read it.

Simple to use

hystrixThe use is very simple, synchronous execution, direct call Domethod.

err := hystrix.Do("my_command", func() error {
   // talk to other services
   return nil
}, func(err error) error {
   // do this when services are down
   return nil
})

Asynchronous execution Gomethod, the internal implementation is to start with a gorouting, if want to get data from a custom method, you need to pass channelto process the data, or output. Return erroris also achannel

 output := make(chan bool, 1)
errors := hystrix.Go("my_command", func() error {
    // talk to other services
    output <- true
    return nil
}, nil)

select {
case out := <-output:
    // success
case err := <-errors:
    // failure

Probably the implementation flow chart
342595-20190619121443948-1229671530.png

In fact, method Doand Gointernal method are called hystrix.GoCmethods, but Dothe method of processing the asynchronous process

func DoC(ctx context.Context, name string, run runFuncC, fallback fallbackFuncC) error {
    done := make(chan struct{}, 1)
    r := func(ctx context.Context) error {
        err := run(ctx)
        if err != nil {
            return err
        }
        done <- struct{}{}
        return nil
    }
    f := func(ctx context.Context, e error) error {
        err := fallback(ctx, e)
        if err != nil {
            return err
        }
        done <- struct{}{}
        return nil
    }
    var errChan chan error
    if fallback == nil {
        errChan = GoC(ctx, name, r, nil)
    } else {
        errChan = GoC(ctx, name, r, f)
    }

    select {
    case <-done:
        return nil
    case err := <-errChan:
        return err
    }
}

Custom Command Configuration

In the call Do Gobefore the other methods we may be customized to some configuration

    hystrix.ConfigureCommand("mycommand", hystrix.CommandConfig{
        Timeout:                int(time.Second * 3),
        MaxConcurrentRequests:  100,
        SleepWindow:            int(time.Second * 5),
        RequestVolumeThreshold: 30,
        ErrorPercentThreshold: 50,
    })

    err := hystrix.DoC(context.Background(), "mycommand", func(ctx context.Context) error {
        // ...
        return nil
    }, func(i context.Context, e error) error {
        // ...
        return e
    })

I say a little big CommandConfigsignificance of the field:

  • Timeout: command execution timeout.默认时间是1000毫秒
  • MaxConcurrentRequests: The maximum amount of concurrent command of 默认值是10
  • SleepWindow: When the fuse is open, SleepWindow of time is to try to control how long the service is available.默认值是5000毫秒
  • RequestVolumeThreshold: a statistics window number of requests within 10 seconds. The number of requests to achieve this before to judge whether or not to open the fuse.默认值是20
  • ErrorPercentThreshold: percentage of error, is equal to the number of requests is greater than RequestVolumeThresholdthe error rate reached and this percentage will start熔断 默认值是50

Of course, if you do not configure them, use默认值

How to use finished, the next step is to analyze the source code. I analyze the code and execution flow from the lower to the upper order

Statistics controller

Each Command will have a default controller statistics, of course, you can add multiple custom controller.
The default statistics controller DefaultMetricCollectorholds 熔断器all the states, 调用次数, 失败次数, 被拒绝次数etc.

type DefaultMetricCollector struct {
    mutex *sync.RWMutex

    numRequests *rolling.Number
    errors      *rolling.Number

    successes               *rolling.Number
    failures                *rolling.Number
    rejects                 *rolling.Number
    shortCircuits           *rolling.Number
    timeouts                *rolling.Number
    contextCanceled         *rolling.Number
    contextDeadlineExceeded *rolling.Number

    fallbackSuccesses *rolling.Number
    fallbackFailures  *rolling.Number
    totalDuration     *rolling.Timing
    runDuration       *rolling.Timing
}

The most important thing to look at rolling.Number, rolling.Numberis the final state of the saved place
Numbersaved within 10 seconds of Bucketsdata, each Buckettime statistical length of 1 second

342595-20190619121504664-2138088134.png

type Number struct {
    Buckets map[int64]*numberBucket
    Mutex   *sync.RWMutex
}

type numberBucket struct {
    Value float64
}

Dictionary field Buckets map[int64]*numberBucketof Keypreservation of the current time
you may be wondering Numberhow to ensure that only the data stored within 10 seconds. Each of the 熔断器state when modified, Numbermust first get the current time (in seconds) is Bucketnot present is created.

func (r *Number) getCurrentBucket() *numberBucket {
    now := time.Now().Unix()
    var bucket *numberBucket
    var ok bool

    if bucket, ok = r.Buckets[now]; !ok {
        bucket = &numberBucket{}
        r.Buckets[now] = bucket
    }

    return bucket
}

Data is removed 10 seconds after the outer submissions

func (r *Number) removeOldBuckets() {
    now := time.Now().Unix() - 10

    for timestamp := range r.Buckets {
        // TODO: configurable rolling window
        if timestamp <= now {
            delete(r.Buckets, timestamp)
        }
    }
}

For example, Incrementmethods to give first Bucketand then delete the old data

func (r *Number) Increment(i float64) {
    if i == 0 {
        return
    }

    r.Mutex.Lock()
    defer r.Mutex.Unlock()

    b := r.getCurrentBucket()
    b.Value += i
    r.removeOldBuckets()
}

Statistics controller is the most basic and most important realization, all the upper executed judgments are based on his data processing logic

Execution state information reporting

断路器-->执行-->上报执行状态信息-->保存到相应的Buckets

342595-20190619121523619-305535128.png

Each logic circuit breaker will report status of execution during execution,

// ReportEvent records command metrics for tracking recent error rates and exposing data to the dashboard.
func (circuit *CircuitBreaker) ReportEvent(eventTypes []string, start time.Time, runDuration time.Duration) error {
    // ...
    circuit.mutex.RLock()
    o := circuit.open
    circuit.mutex.RUnlock()
    if eventTypes[0] == "success" && o {
        circuit.setClose()
    }
    var concurrencyInUse float64
    if circuit.executorPool.Max > 0 {
        concurrencyInUse = float64(circuit.executorPool.ActiveCount()) / float64(circuit.executorPool.Max)
    }
    select {
    case circuit.metrics.Updates <- &commandExecution{
        Types:            eventTypes,
        Start:            start,
        RunDuration:      runDuration,
        ConcurrencyInUse: concurrencyInUse,
    }:
    default:
        return CircuitError{Message: fmt.Sprintf("metrics channel (%v) is at capacity", circuit.Name)}
    }

    return nil
}

circuit.metrics.Updates 这个信道就是处理上报信息的,上报执行状态自信的结构是metricExchange,结构体很简单只有4个字段。要的就是

  • channel字段Updates 他是一个有bufferchannel默认的数量是2000个,所有的状态信息都在他里面
  • metricCollectors字段,就是保存的具体的这个command执行过程中的各种信息
type metricExchange struct {
    Name    string
    Updates chan *commandExecution
    Mutex   *sync.RWMutex

    metricCollectors []metricCollector.MetricCollector
}

type commandExecution struct {
    Types            []string      `json:"types"`
    Start            time.Time     `json:"start_time"`
    RunDuration      time.Duration `json:"run_duration"`
    ConcurrencyInUse float64       `json:"concurrency_inuse"`
}

func newMetricExchange(name string) *metricExchange {
    m := &metricExchange{}
    m.Name = name

    m.Updates = make(chan *commandExecution, 2000)
    m.Mutex = &sync.RWMutex{}
    m.metricCollectors = metricCollector.Registry.InitializeMetricCollectors(name)
    m.Reset()

    go m.Monitor()

    return m
}

在执行newMetricExchange的时候会启动一个协程 go m.Monitor()去监控Updates的数据,然后上报给metricCollectors 保存执行的信息数据比如前面提到的调用次数失败次数被拒绝次数等等

func (m *metricExchange) Monitor() {
    for update := range m.Updates {
        // we only grab a read lock to make sure Reset() isn't changing the numbers.
        m.Mutex.RLock()

        totalDuration := time.Since(update.Start)
        wg := &sync.WaitGroup{}
        for _, collector := range m.metricCollectors {
            wg.Add(1)
            go m.IncrementMetrics(wg, collector, update, totalDuration)
        }
        wg.Wait()

        m.Mutex.RUnlock()
    }
}

更新调用的是go m.IncrementMetrics(wg, collector, update, totalDuration),里面判断了他的状态

func (m *metricExchange) IncrementMetrics(wg *sync.WaitGroup, collector metricCollector.MetricCollector, update *commandExecution, totalDuration time.Duration) {
    // granular metrics
    r := metricCollector.MetricResult{
        Attempts:         1,
        TotalDuration:    totalDuration,
        RunDuration:      update.RunDuration,
        ConcurrencyInUse: update.ConcurrencyInUse,
    }
    switch update.Types[0] {
    case "success":
        r.Successes = 1
    case "failure":
        r.Failures = 1
        r.Errors = 1
    case "rejected":
        r.Rejects = 1
        r.Errors = 1
    // ...
    }
    // ...
    collector.Update(r)
    wg.Done()
}

流量控制

hystrix-go对流量控制的代码是很简单的。用了一个简单的令牌算法,能得到令牌的就可以执行后继的工作,执行完后要返还令牌。得不到令牌就拒绝,拒绝后调用用户设置的callback方法,如果没有设置就不执行。
结构体executorPool就是hystrix-go 流量控制的具体实现。字段Max就是每秒最大的并发值。

type executorPool struct {
    Name    string
    Metrics *poolMetrics
    Max     int
    Tickets chan *struct{}
}

在创建executorPool的时候,会根据Max值来创建令牌。Max值如果没有设置会使用默认值10

func newExecutorPool(name string) *executorPool {
    p := &executorPool{}
    p.Name = name
    p.Metrics = newPoolMetrics(name)
    p.Max = getSettings(name).MaxConcurrentRequests

    p.Tickets = make(chan *struct{}, p.Max)
    for i := 0; i < p.Max; i++ {
        p.Tickets <- &struct{}{}
    }

    return p
}

流量控制上报状态

注意一下字段 Metrics 他用于统计执行数量,比如:执行的总数量,最大的并发数 具体的代码就不贴上来了。这个数量也可以显露出,供可视化程序直观的表现出来。

令牌使用完后是需要返还的,返回的时候才会做上面所说的统计工作。

func (p *executorPool) Return(ticket *struct{}) {
    if ticket == nil {
        return
    }

    p.Metrics.Updates <- poolMetricsUpdate{
        activeCount: p.ActiveCount(),
    }
    p.Tickets <- ticket
}

func (p *executorPool) ActiveCount() int {
    return p.Max - len(p.Tickets)
}

一次Command的执行的流程

Above the 统计控制器, 流量控制, 上报执行状态finished the main realization will talk about it. The final step is a command string to perform all experienced what:

 err := hystrix.Do("my_command", func() error {
    // talk to other services
    return nil
}, func(err error) error {
    // do this when services are down
    return nil
})

hystrixIn front of a command execution also have mentioned the call to GoCthe method, I put the following code stickers, the 篇幅问题去掉了一些代码main logic are. That is 判断断路器是否已打开, 得到Ticketnot to limit, 执行我们自己的的方法, 判断context是否Done或者执行是否超时
of course, every time the results 上报执行状态, and finally to返还Ticket

func GoC(ctx context.Context, name string, run runFuncC, fallback fallbackFuncC) chan error {
    cmd := &command{
        run:      run,
        fallback: fallback,
        start:    time.Now(),
        errChan:  make(chan error, 1),
        finished: make(chan bool, 1),
    }
    //得到断路器,不存在则创建
    circuit, _, err := GetCircuit(name)
    if err != nil {
        cmd.errChan <- err
        return cmd.errChan
    }
    //...
    // 返还ticket
    returnTicket := func() {
        // ...
        cmd.circuit.executorPool.Return(cmd.ticket)
    }
    // 上报执行状态
    reportAllEvent := func() {
        err := cmd.circuit.ReportEvent(cmd.events, cmd.start, cmd.runDuration)
        // ...
    }
    go func() {
        defer func() { cmd.finished <- true }()
        // 查看断路器是否已打开
        if !cmd.circuit.AllowRequest() {
            // ...
            returnOnce.Do(func() {
                returnTicket()
                cmd.errorWithFallback(ctx, ErrCircuitOpen)
                reportAllEvent()
            })
            return
        }
        // ...
        // 获取ticket 如果得不到就限流
        select {
        case cmd.ticket = <-circuit.executorPool.Tickets:
            ticketChecked = true
            ticketCond.Signal()
            cmd.Unlock()
        default:
            // ...
            returnOnce.Do(func() {
                returnTicket()
                cmd.errorWithFallback(ctx, ErrMaxConcurrency)
                reportAllEvent()
            })
            return
        }
        // 执行我们自已的方法,并上报执行信息
        returnOnce.Do(func() {
            defer reportAllEvent()
            cmd.runDuration = time.Since(runStart)
            returnTicket()
            if runErr != nil {
                cmd.errorWithFallback(ctx, runErr)
                return
            }
            cmd.reportEvent("success")
        })
    }()
    // 等待context是否被结束,或执行者超时,并上报
    go func() {
        timer := time.NewTimer(getSettings(name).Timeout)
        defer timer.Stop()

        select {
        case <-cmd.finished:
            // returnOnce has been executed in another goroutine
        case <-ctx.Done():
            // ...
            return
        case <-timer.C:
            // ...
        }
    }()

    return cmd.errChan
}

Visualization dashboard of reporting information hystrix

The code StreamHandleris to put all 断路器the state by way of the flow of the constant push to the Dashboard . This part of the code I would not have said, is very simple.
You need to add three lines of code in your server, to start our streaming service

    hystrixStreamHandler := hystrix.NewStreamHandler()
    hystrixStreamHandler.Start()
    go http.ListenAndServe(net.JoinHostPort("", "81"), hystrixStreamHandler)

dashboardI'm using dockerversion.

docker run -d -p 8888:9002 --name hystrix-dashboard mlabouardy/hystrix-dashboard:latest

342595-20190619121558413-476710354.png

Enter the address of your service in the following, I was
http://192.168.1.67:81/hystrix.stream

342595-20190619121607579-678713551.png

If a cluster can use Turbine monitor, we all have time to look at it

342595-20190619121425618-179856202.png

Guess you like

Origin blog.csdn.net/mi_duo/article/details/92836521