Reading the source code of the process, like in the martial arts, like reading, like martial arts Cheats, expert analysis of one by one, to extract the essence, to enhance their own internal forces.
Previous post said about the avalanche effect and common solution micro services , too much water, not on how the code called solutions. github
There are a lot of open source libraries to solve 雪崩问题
, the more famous is Netflix
the open source library hystrix . Collection 流量控制
, 熔断
, 容错
is equal to a java
language library. Today analysis of the source library is hystrix-Go , he is hystrix of the go
language version, it should be said that a simplified version, with very little amount of code to achieve the main function. It recommended friends have time to read it.
Simple to use
hystrix
The use is very simple, synchronous execution, direct call Do
method.
err := hystrix.Do("my_command", func() error {
// talk to other services
return nil
}, func(err error) error {
// do this when services are down
return nil
})
Asynchronous execution Go
method, the internal implementation is to start with a gorouting
, if want to get data from a custom method, you need to pass channel
to process the data, or output. Return error
is also achannel
output := make(chan bool, 1)
errors := hystrix.Go("my_command", func() error {
// talk to other services
output <- true
return nil
}, nil)
select {
case out := <-output:
// success
case err := <-errors:
// failure
Probably the implementation flow chart
In fact, method Do
and Go
internal method are called hystrix.GoC
methods, but Do
the method of processing the asynchronous process
func DoC(ctx context.Context, name string, run runFuncC, fallback fallbackFuncC) error {
done := make(chan struct{}, 1)
r := func(ctx context.Context) error {
err := run(ctx)
if err != nil {
return err
}
done <- struct{}{}
return nil
}
f := func(ctx context.Context, e error) error {
err := fallback(ctx, e)
if err != nil {
return err
}
done <- struct{}{}
return nil
}
var errChan chan error
if fallback == nil {
errChan = GoC(ctx, name, r, nil)
} else {
errChan = GoC(ctx, name, r, f)
}
select {
case <-done:
return nil
case err := <-errChan:
return err
}
}
Custom Command Configuration
In the call Do
Go
before the other methods we may be customized to some configuration
hystrix.ConfigureCommand("mycommand", hystrix.CommandConfig{
Timeout: int(time.Second * 3),
MaxConcurrentRequests: 100,
SleepWindow: int(time.Second * 5),
RequestVolumeThreshold: 30,
ErrorPercentThreshold: 50,
})
err := hystrix.DoC(context.Background(), "mycommand", func(ctx context.Context) error {
// ...
return nil
}, func(i context.Context, e error) error {
// ...
return e
})
I say a little big CommandConfig
significance of the field:
- Timeout: command execution timeout.
默认时间是1000毫秒
- MaxConcurrentRequests: The maximum amount of concurrent command of
默认值是10
- SleepWindow: When the fuse is open, SleepWindow of time is to try to control how long the service is available.
默认值是5000毫秒
- RequestVolumeThreshold: a statistics window number of requests within 10 seconds. The number of requests to achieve this before to judge whether or not to open the fuse.
默认值是20
- ErrorPercentThreshold: percentage of error, is equal to the number of requests is greater than
RequestVolumeThreshold
the error rate reached and this percentage will start熔断
默认值是50
Of course, if you do not configure them, use默认值
How to use finished, the next step is to analyze the source code. I analyze the code and execution flow from the lower to the upper order
Statistics controller
Each Command will have a default controller statistics, of course, you can add multiple custom controller.
The default statistics controller DefaultMetricCollector
holds 熔断器
all the states, 调用次数
, 失败次数
, 被拒绝次数
etc.
type DefaultMetricCollector struct {
mutex *sync.RWMutex
numRequests *rolling.Number
errors *rolling.Number
successes *rolling.Number
failures *rolling.Number
rejects *rolling.Number
shortCircuits *rolling.Number
timeouts *rolling.Number
contextCanceled *rolling.Number
contextDeadlineExceeded *rolling.Number
fallbackSuccesses *rolling.Number
fallbackFailures *rolling.Number
totalDuration *rolling.Timing
runDuration *rolling.Timing
}
The most important thing to look at rolling.Number
, rolling.Number
is the final state of the saved place Number
saved within 10 seconds of Buckets
data, each Bucket
time statistical length of 1 second
type Number struct {
Buckets map[int64]*numberBucket
Mutex *sync.RWMutex
}
type numberBucket struct {
Value float64
}
Dictionary field Buckets map[int64]*numberBucket
of Key
preservation of the current time
you may be wondering Number
how to ensure that only the data stored within 10 seconds. Each of the 熔断器
state when modified, Number
must first get the current time (in seconds) is Bucket
not present is created.
func (r *Number) getCurrentBucket() *numberBucket {
now := time.Now().Unix()
var bucket *numberBucket
var ok bool
if bucket, ok = r.Buckets[now]; !ok {
bucket = &numberBucket{}
r.Buckets[now] = bucket
}
return bucket
}
Data is removed 10 seconds after the outer submissions
func (r *Number) removeOldBuckets() {
now := time.Now().Unix() - 10
for timestamp := range r.Buckets {
// TODO: configurable rolling window
if timestamp <= now {
delete(r.Buckets, timestamp)
}
}
}
For example, Increment
methods to give first Bucket
and then delete the old data
func (r *Number) Increment(i float64) {
if i == 0 {
return
}
r.Mutex.Lock()
defer r.Mutex.Unlock()
b := r.getCurrentBucket()
b.Value += i
r.removeOldBuckets()
}
Statistics controller is the most basic and most important realization, all the upper executed judgments are based on his data processing logic
Execution state information reporting
断路器-->执行-->上报执行状态信息-->保存到相应的Buckets
Each logic circuit breaker will report status of execution during execution,
// ReportEvent records command metrics for tracking recent error rates and exposing data to the dashboard.
func (circuit *CircuitBreaker) ReportEvent(eventTypes []string, start time.Time, runDuration time.Duration) error {
// ...
circuit.mutex.RLock()
o := circuit.open
circuit.mutex.RUnlock()
if eventTypes[0] == "success" && o {
circuit.setClose()
}
var concurrencyInUse float64
if circuit.executorPool.Max > 0 {
concurrencyInUse = float64(circuit.executorPool.ActiveCount()) / float64(circuit.executorPool.Max)
}
select {
case circuit.metrics.Updates <- &commandExecution{
Types: eventTypes,
Start: start,
RunDuration: runDuration,
ConcurrencyInUse: concurrencyInUse,
}:
default:
return CircuitError{Message: fmt.Sprintf("metrics channel (%v) is at capacity", circuit.Name)}
}
return nil
}
circuit.metrics.Updates
这个信道就是处理上报信息的,上报执行状态自信的结构是metricExchange
,结构体很简单只有4个字段。要的就是
channel
字段Updates
他是一个有buffer
的channel
默认的数量是2000
个,所有的状态信息都在他里面metricCollectors
字段,就是保存的具体的这个command
执行过程中的各种信息
type metricExchange struct {
Name string
Updates chan *commandExecution
Mutex *sync.RWMutex
metricCollectors []metricCollector.MetricCollector
}
type commandExecution struct {
Types []string `json:"types"`
Start time.Time `json:"start_time"`
RunDuration time.Duration `json:"run_duration"`
ConcurrencyInUse float64 `json:"concurrency_inuse"`
}
func newMetricExchange(name string) *metricExchange {
m := &metricExchange{}
m.Name = name
m.Updates = make(chan *commandExecution, 2000)
m.Mutex = &sync.RWMutex{}
m.metricCollectors = metricCollector.Registry.InitializeMetricCollectors(name)
m.Reset()
go m.Monitor()
return m
}
在执行newMetricExchange
的时候会启动一个协程 go m.Monitor()
去监控Updates
的数据,然后上报给metricCollectors
保存执行的信息数据比如前面提到的调用次数
,失败次数
,被拒绝次数
等等
func (m *metricExchange) Monitor() {
for update := range m.Updates {
// we only grab a read lock to make sure Reset() isn't changing the numbers.
m.Mutex.RLock()
totalDuration := time.Since(update.Start)
wg := &sync.WaitGroup{}
for _, collector := range m.metricCollectors {
wg.Add(1)
go m.IncrementMetrics(wg, collector, update, totalDuration)
}
wg.Wait()
m.Mutex.RUnlock()
}
}
更新调用的是go m.IncrementMetrics(wg, collector, update, totalDuration)
,里面判断了他的状态
func (m *metricExchange) IncrementMetrics(wg *sync.WaitGroup, collector metricCollector.MetricCollector, update *commandExecution, totalDuration time.Duration) {
// granular metrics
r := metricCollector.MetricResult{
Attempts: 1,
TotalDuration: totalDuration,
RunDuration: update.RunDuration,
ConcurrencyInUse: update.ConcurrencyInUse,
}
switch update.Types[0] {
case "success":
r.Successes = 1
case "failure":
r.Failures = 1
r.Errors = 1
case "rejected":
r.Rejects = 1
r.Errors = 1
// ...
}
// ...
collector.Update(r)
wg.Done()
}
流量控制
hystrix-go
对流量控制的代码是很简单的。用了一个简单的令牌算法,能得到令牌的就可以执行后继的工作,执行完后要返还令牌。得不到令牌就拒绝,拒绝后调用用户设置的callback
方法,如果没有设置就不执行。
结构体executorPool
就是hystrix-go
流量控制
的具体实现。字段Max
就是每秒最大的并发值。
type executorPool struct {
Name string
Metrics *poolMetrics
Max int
Tickets chan *struct{}
}
在创建executorPool
的时候,会根据Max
值来创建令牌
。Max值如果没有设置会使用默认值10
func newExecutorPool(name string) *executorPool {
p := &executorPool{}
p.Name = name
p.Metrics = newPoolMetrics(name)
p.Max = getSettings(name).MaxConcurrentRequests
p.Tickets = make(chan *struct{}, p.Max)
for i := 0; i < p.Max; i++ {
p.Tickets <- &struct{}{}
}
return p
}
流量控制上报状态
注意一下字段 Metrics
他用于统计执行数量,比如:执行的总数量
,最大的并发数
具体的代码就不贴上来了。这个数量也可以显露出,供可视化程序直观的表现出来。
令牌使用完后是需要返还的,返回的时候才会做上面所说的统计工作。
func (p *executorPool) Return(ticket *struct{}) {
if ticket == nil {
return
}
p.Metrics.Updates <- poolMetricsUpdate{
activeCount: p.ActiveCount(),
}
p.Tickets <- ticket
}
func (p *executorPool) ActiveCount() int {
return p.Max - len(p.Tickets)
}
一次Command的执行的流程
Above the 统计控制器
, 流量控制
, 上报执行状态
finished the main realization will talk about it. The final step is a command string to perform all experienced what:
err := hystrix.Do("my_command", func() error {
// talk to other services
return nil
}, func(err error) error {
// do this when services are down
return nil
})
hystrix
In front of a command execution also have mentioned the call to GoC
the method, I put the following code stickers, the 篇幅问题去掉了一些代码
main logic are. That is 判断断路器是否已打开
, 得到Ticket
not to limit, 执行我们自己的的方法
, 判断context是否Done或者执行是否超时
of course, every time the results 上报执行状态
, and finally to返还Ticket
func GoC(ctx context.Context, name string, run runFuncC, fallback fallbackFuncC) chan error {
cmd := &command{
run: run,
fallback: fallback,
start: time.Now(),
errChan: make(chan error, 1),
finished: make(chan bool, 1),
}
//得到断路器,不存在则创建
circuit, _, err := GetCircuit(name)
if err != nil {
cmd.errChan <- err
return cmd.errChan
}
//...
// 返还ticket
returnTicket := func() {
// ...
cmd.circuit.executorPool.Return(cmd.ticket)
}
// 上报执行状态
reportAllEvent := func() {
err := cmd.circuit.ReportEvent(cmd.events, cmd.start, cmd.runDuration)
// ...
}
go func() {
defer func() { cmd.finished <- true }()
// 查看断路器是否已打开
if !cmd.circuit.AllowRequest() {
// ...
returnOnce.Do(func() {
returnTicket()
cmd.errorWithFallback(ctx, ErrCircuitOpen)
reportAllEvent()
})
return
}
// ...
// 获取ticket 如果得不到就限流
select {
case cmd.ticket = <-circuit.executorPool.Tickets:
ticketChecked = true
ticketCond.Signal()
cmd.Unlock()
default:
// ...
returnOnce.Do(func() {
returnTicket()
cmd.errorWithFallback(ctx, ErrMaxConcurrency)
reportAllEvent()
})
return
}
// 执行我们自已的方法,并上报执行信息
returnOnce.Do(func() {
defer reportAllEvent()
cmd.runDuration = time.Since(runStart)
returnTicket()
if runErr != nil {
cmd.errorWithFallback(ctx, runErr)
return
}
cmd.reportEvent("success")
})
}()
// 等待context是否被结束,或执行者超时,并上报
go func() {
timer := time.NewTimer(getSettings(name).Timeout)
defer timer.Stop()
select {
case <-cmd.finished:
// returnOnce has been executed in another goroutine
case <-ctx.Done():
// ...
return
case <-timer.C:
// ...
}
}()
return cmd.errChan
}
Visualization dashboard of reporting information hystrix
The code StreamHandler
is to put all 断路器
the state by way of the flow of the constant push to the Dashboard . This part of the code I would not have said, is very simple.
You need to add three lines of code in your server, to start our streaming service
hystrixStreamHandler := hystrix.NewStreamHandler()
hystrixStreamHandler.Start()
go http.ListenAndServe(net.JoinHostPort("", "81"), hystrixStreamHandler)
dashboard
I'm using docker
version.
docker run -d -p 8888:9002 --name hystrix-dashboard mlabouardy/hystrix-dashboard:latest
Enter the address of your service in the following, I washttp://192.168.1.67:81/hystrix.stream
If a cluster can use Turbine monitor, we all have time to look at it