Sentinel source code 9-system protection SystemSlot and authority control AuthoritySlot

Welcome everyone to pay attention to  github.com/hsfxuebao  , I hope it will be helpful to you. If you think it is possible, please click Star.

1. SystemSlot

Sentinel system adaptive current limiting controls the application ingress traffic from the overall dimension, combined with the monitoring indicators of several dimensions such as application Load, CPU usage, overall average RT, ingress QPS and the number of concurrent threads, through an adaptive flow control strategy , 让系统的入口流量和系统的负载达到一个平衡, so that the system can run at the maximum throughput as much as possible while ensuring the overall stability of the system.

Sentinel system adaptive current limiting refers to the TCP BBRimplementation, and makes a balance according to the requests that the system can handle and the incoming requests, rather than limiting the current through a system load, its ultimate goal is to prevent the system from being overwhelmed. , improve the system throughput, rather than the load must be lower than a certain threshold

1.1 System Rules

The system protection rules are controlled from the application-level ingress traffic, and the application indicators are monitored from several dimensions such as the load of a single machine, CPU usage, average RT, ingress QPS, and the number of concurrent threads, so that the system can run at the maximum throughput as much as possible. while ensuring the overall stability of the system.

System protection rules are applied 整体维度, not resource dimensional, and 仅对入口流量生效. Ingress traffic refers to the traffic ( EntryType.IN) entering the application, such as requests received by Web services or Dubbo servers, which are all ingress traffic.

index illustrate
Load adaptive (only valid for Linux/Unix-like machines) The load1 of the system is used as the heuristic index to perform adaptive system protection. When the system load1 exceeds the set heuristic value, and the current number of concurrent threads of the system exceeds the estimated system capacity, the system protection (BBR phase) will be triggered. System capacity is estimated by the system's maxQps * minRt. The setting reference value is generally CPU cores * 2.5.
CPU usage (version 1.5.0+) When the system CPU usage exceeds the threshold, the system protection will be triggered (value range 0.0-1.0), which is more sensitive.
Average RT System protection is triggered when the average RT of all ingress traffic on a single machine reaches the threshold, in milliseconds.
Number of concurrent threads System protection is triggered when the number of concurrent threads of all ingress traffic on a single machine reaches the threshold.
Ingress QPS System protection is triggered when the QPS of all ingress traffic on a single machine reaches the threshold.

1.2 Principle

image.pngWe imagine the process of the system processing requests as a water pipe. The incoming request is to pour water into this water pipe. When the system is processing smoothly, the request does not need to be queued, and passes directly through the water pipe. The RT of this request is the shortest; otherwise, When the request piles up, then the time to process the request becomes: 排队时间 + 最短处理时间.

If we can ensure the amount of water in the water pipe and allow the water to flow smoothly, there will be no increase in queued requests; that is, the system load at this time will not deteriorate further.

We use Tto represent (the amount of water inside the water pipe), RTto represent the processing time of the request, and to use P to represent the number of incoming requests, then a request from entering the water pipe to coming out of the water pipe, this water pipe will have P * RT a request. In other words, when T ≈ QPS * Avg(RT)is , we can think that the processing capacity of the system and the number of incoming requests have reached a balance, and the load of the system will not deteriorate further.

The next problem is that the water level of the water pipe can reach an equilibrium point, but this equilibrium point can only ensure that the water level of the water pipe will not continue to increase, but there is still a problem, that is, before reaching the equilibrium point, the water pipe has accumulated how much water. If the water in the water pipe is already at an order of magnitude, then the amount of water allowed by the system at this time may only pass slowly, the RT will be large, and the water accumulated in the water pipe will stay; Then it will waste the processing power of the system.

When the flow rate at the inlet is kept at the maximum value of the flow rate out of the water pipe, the processing capacity of the water pipe can be maximized.

1.3 Adaptive current limiting usage example

public class SystemGuardDemo {
 
    private static AtomicInteger pass = new AtomicInteger();
    private static AtomicInteger block = new AtomicInteger();
    private static AtomicInteger total = new AtomicInteger();
 
    private static volatile boolean stop = false;
    private static final int threadCount = 100;
 
    private static int seconds = 60 + 40;
 
    public static void main(String[] args) throws Exception {
        
        tick();
        //初始化规则参数
        initSystemRule();
        //开启线程执行流量调用
        for (int i = 0; i < threadCount; i++) {
            Thread entryThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (true) {
                        Entry entry = null;
                        try {
                            entry = SphU.entry("methodA", EntryType.IN);
                            pass.incrementAndGet();
                            try {
                                TimeUnit.MILLISECONDS.sleep(20);
                            } catch (InterruptedException e) {
                                // ignore
                            }
                        } catch (BlockException e1) {
                            block.incrementAndGet();
                            try {
                                TimeUnit.MILLISECONDS.sleep(20);
                            } catch (InterruptedException e) {
                                // ignore
                            }
                        } catch (Exception e2) {
                            // biz exception
                        } finally {
                            total.incrementAndGet();
                            if (entry != null) {
                                entry.exit();
                            }
                        }
                    }
                }
 
            });
            entryThread.setName("working-thread");
            entryThread.start();
        }
    }
 
    private static void initSystemRule() {
        List<SystemRule> rules = new ArrayList<SystemRule>();
        SystemRule rule = new SystemRule();
        // max load is 3
        rule.setHighestSystemLoad(3.0);
        // max cpu usage is 60%
        rule.setHighestCpuUsage(0.6);
        // max avg rt of all request is 10 ms
        rule.setAvgRt(10);
        // max total qps is 20
        rule.setQps(20);
        // max parallel working thread is 10
        rule.setMaxThread(10);
 
        rules.add(rule);
        SystemRuleManager.loadRules(Collections.singletonList(rule));
    }
 
    private static void tick() {
        Thread timer = new Thread(new TimerTask());
        timer.setName("sentinel-timer-task");
        timer.start();
    }
 
    static class TimerTask implements Runnable {
        @Override
        public void run() {
            System.out.println("begin to statistic!!!");
            long oldTotal = 0;
            long oldPass = 0;
            long oldBlock = 0;
            while (!stop) {
                try {
                    TimeUnit.SECONDS.sleep(1);
                } catch (InterruptedException e) {
                }
                long globalTotal = total.get();
                long oneSecondTotal = globalTotal - oldTotal;
                oldTotal = globalTotal;
 
                long globalPass = pass.get();
                long oneSecondPass = globalPass - oldPass;
                oldPass = globalPass;
 
                long globalBlock = block.get();
                long oneSecondBlock = globalBlock - oldBlock;
                oldBlock = globalBlock;
 
                System.out.println(seconds + ", " + TimeUtil.currentTimeMillis() + ", total:"
                    + oneSecondTotal + ", pass:"
                    + oneSecondPass + ", block:" + oneSecondBlock);
                if (seconds-- <= 0) {
                    stop = true;
                }
            }
            System.exit(0);
        }
    }
}
复制代码

1.4 Source code analysis

1.4.1 Adaptive current limiting inlet SystemSlot

@Spi(order = Constants.ORDER_SYSTEM_SLOT)
public class SystemSlot extends AbstractLinkedProcessorSlot<DefaultNode> {

    @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
        // todo entry方法中调用SystemRuleManager.checkSystem方法,这里是自适应限流的关键点
        SystemRuleManager.checkSystem(resourceWrapper, count);
        // 在职责链上继续调用下一个slot节点。
        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

    @Override
    public void exit(Context context, ResourceWrapper resourceWrapper, int count, Object... args) {
        fireExit(context, resourceWrapper, count, args);
    }

}
复制代码

1.4.2 SystemRuleManager

1.4.2.1 Class Diagram

sentinel自适应限流通过SystemRuleManager类来实现,它里面封装了BBR算法的实现,以及系统指标的采集,接下来我们看下它的类图以及核心属性。 image.png

我们来看下SystemRuleManager的一些属性:

//系统最大负载
private static volatile double highestSystemLoad = Double.MAX_VALUE;
/**
 * cpu usage, between [0, 1]
 */
 //CPU使用率,介于[0,1]之间
private static volatile double highestCpuUsage = Double.MAX_VALUE;
private static volatile double qps = Double.MAX_VALUE;
//最大延迟
private static volatile long maxRt = Long.MAX_VALUE;
//最大线程数
private static volatile long maxThread = Long.MAX_VALUE;
//采集系统cpu load、cpu使用率的实现。
private static SystemStatusListener statusListener = null;
复制代码

1.4.2.2 核心方法源码分析

  • checkSystem:

    public static void checkSystem(ResourceWrapper resourceWrapper, int count) throws BlockException {
        // 检查资源是否为空,如果为空直接返回
        if (resourceWrapper == null) {
            return;
        }
        // Ensure the checking switch is on.
        // 判断系统自适应限流是否开启,未开启直接返回。
        if (!checkSystemStatus.get()) {
            return;
        }
    
        // for inbound traffic only
        // 判断资源的流量是否为入口流量,Sentinel系统自适应限流只对入口流量生效
        if (resourceWrapper.getEntryType() != EntryType.IN) {
            return;
        }
    
        // total qps
        // 从Constants.ENTRY_NODE获取当前qps,如果当前qps大于SystemRule配置的阈值,直接抛SystemBlockException异常
        double currentQps = Constants.ENTRY_NODE == null ? 0.0 : Constants.ENTRY_NODE.passQps();
        if (currentQps + count > qps) {
            throw new SystemBlockException(resourceWrapper.getName(), "qps");
        }
    
        // total thread
        // 从Constants.ENTRY_NODE获取当前thread,如果当前thread大于SystemRule配置的阈值,直接抛SystemBlockException 异常
        int currentThread = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.curThreadNum();
        if (currentThread > maxThread) {
            throw new SystemBlockException(resourceWrapper.getName(), "thread");
        }
    
        // 从Constants.ENTRY_NODE获取当前avgRT,如果当前avgRT大于SystemRule配置的阈值,直接抛SystemBlockException异常
        double rt = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.avgRt();
        if (rt > maxRt) {
            throw new SystemBlockException(resourceWrapper.getName(), "rt");
        }
    
        // load. BBR algorithm.
        // 进行bbr算法校验
        // 校验系统负载开关是否打开,当前系统load是否大于配置的系统load,如果都满足则继续校验
        if (highestSystemLoadIsSet && getCurrentSystemAvgLoad() > highestSystemLoad) {
            // 调用checkBbr方法,之前我们有说过系统通过流量:T ≈ QPS * Avg(RT)时
            // 我们可以认为系统的处理能力和允许进入的请求个数达到了平衡,所以checkBbr方法计算的公式以秒为单位:T=QPS*RT/1000。
            // 如果当前线程数大于T,则进行拦截
            if (!checkBbr(currentThread)) {
                throw new SystemBlockException(resourceWrapper.getName(), "load");
            }
        }
    
        // cpu usage
        // 判断当前CPU使用率是否大于SystemRule配置的阈值,如果是则抛出SystemBlockException异常
        if (highestCpuUsageIsSet && getCurrentCpuUsage() > highestCpuUsage) {
            throw new SystemBlockException(resourceWrapper.getName(), "cpu");
        }
    }
    
    //bbr算法
    private static boolean checkBbr(int currentThread) {
        if (currentThread > 1 &&
            currentThread > Constants.ENTRY_NODE.maxSuccessQps() * Constants.ENTRY_NODE.minRt() / 1000) {
            return false;
        }
        return true;
    }
    复制代码
  • Constants.ENTRY_NODE

    • 自适应限流使用的是全局的ClusterNode节点,这就是说自适应限流的维度是整个系统。
    public final static ClusterNode ENTRY_NODE = new ClusterNode(TOTAL_IN_RESOURCE_NAME, ResourceTypeConstants.COMMON);
    复制代码
  • 系统指标采集运行

    @SuppressWarnings("PMD.ThreadPoolCreationRule")
    private final static ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1,
        new NamedThreadFactory("sentinel-system-status-record-task", true));
    
    static {
        checkSystemStatus.set(false);
        statusListener = new SystemStatusListener();
        scheduler.scheduleAtFixedRate(statusListener, 0, 1, TimeUnit.SECONDS);
        currentProperty.addListener(listener);
    }
    复制代码

    SystemRuleManager类中定义了ScheduledExecutorService线程池,在静态块里面触发SystemStatusListener类的运行,运行时间是1秒钟一次,这表示Sentinel的自适应保护信息采集为1秒钟采集系统load、cpu信息。

1.4.3 系统指标采集源码分析

SystemStatusListener 类图:

image.png

该类实现了runnable接口,他通过一个线程每隔一秒执行一次load、cpu usage信息的采集。

  • 源码分析
public class SystemStatusListener implements Runnable {

    volatile double currentLoad = -1;
    volatile double currentCpuUsage = -1;

    volatile String reason = StringUtil.EMPTY;

    volatile long processCpuTime = 0;
    volatile long processUpTime = 0;

    public double getSystemAverageLoad() {
        return currentLoad;
    }

    public double getCpuUsage() {
        return currentCpuUsage;
    }

    @Override
    public void run() {
        try {
            // Sentinel通过jdk:ManagementFactory类获取系统load、cpu等信息。
            OperatingSystemMXBean osBean = ManagementFactory.getPlatformMXBean(OperatingSystemMXBean.class);
            // 获取当前系统load。
            currentLoad = osBean.getSystemLoadAverage();

            /*
             * Java Doc copied from {@link OperatingSystemMXBean#getSystemCpuLoad()}:</br>
             * Returns the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval.
             * A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value
             * of 1.0 means that all CPUs were actively running 100% of the time during the recent period being
             * observed. All values between 0.0 and 1.0 are possible depending of the activities going on in the
             * system. If the system recent cpu usage is not available, the method returns a negative value.
             */

            // 获取当前系统cpu usage
            double systemCpuUsage = osBean.getSystemCpuLoad();

            // calculate process cpu usage to support application running in container environment
            RuntimeMXBean runtimeBean = ManagementFactory.getPlatformMXBean(RuntimeMXBean.class);
            // 获取系统cpu运行时间
            long newProcessCpuTime = osBean.getProcessCpuTime();
            // 获取当前jvm cpu运行时间
            long newProcessUpTime = runtimeBean.getUptime();
            // 获取系统cpu核心数
            int cpuCores = osBean.getAvailableProcessors();
            long processCpuTimeDiffInMs = TimeUnit.NANOSECONDS
                    .toMillis(newProcessCpuTime - processCpuTime);
            long processUpTimeDiffInMs = newProcessUpTime - processUpTime;
            // 计算CPU使用率
            double processCpuUsage = (double) processCpuTimeDiffInMs / processUpTimeDiffInMs / cpuCores;
            processCpuTime = newProcessCpuTime;
            processUpTime = newProcessUpTime;

            currentCpuUsage = Math.max(processCpuUsage, systemCpuUsage);

            if (currentLoad > SystemRuleManager.getSystemLoadThreshold()) {
                writeSystemStatusLog();
            }
        } catch (Throwable e) {
            RecordLog.warn("[SystemStatusListener] Failed to get system metrics from JMX", e);
        }
    }

    private void writeSystemStatusLog() {
        StringBuilder sb = new StringBuilder();
        sb.append("Load exceeds the threshold: ");
        sb.append("load:").append(String.format("%.4f", currentLoad)).append("; ");
        sb.append("cpuUsage:").append(String.format("%.4f", currentCpuUsage)).append("; ");
        sb.append("qps:").append(String.format("%.4f", Constants.ENTRY_NODE.passQps())).append("; ");
        sb.append("rt:").append(String.format("%.4f", Constants.ENTRY_NODE.avgRt())).append("; ");
        sb.append("thread:").append(Constants.ENTRY_NODE.curThreadNum()).append("; ");
        sb.append("success:").append(String.format("%.4f", Constants.ENTRY_NODE.successQps())).append("; ");
        sb.append("minRt:").append(String.format("%.2f", Constants.ENTRY_NODE.minRt())).append("; ");
        sb.append("maxSuccess:").append(String.format("%.2f", Constants.ENTRY_NODE.maxSuccessQps())).append("; ");
        RecordLog.info(sb.toString());
    }
}
复制代码

1.5 Sentinel后台配置

配置系统保护规则: image.png

那么,后台配置是怎么映射进入的呢?

SystemRuleManager中,有一个静态方法(loadRules(List<SystemRule> rules) )去初始化SystemRule配置。

  public static void loadRules(List<SystemRule> rules) {
        currentProperty.updateValue(rules);
  }
复制代码

而在更新currentProperty的时候,实质是通知观察者去更新,这里使用的是观察者模式

public class DynamicSentinelProperty<T> implements SentinelProperty<T> {
    //观察者
    protected Set<PropertyListener<T>> listeners = Collections.synchronizedSet(new HashSet<PropertyListener<T>>());
    //更新值
    @Override
    public boolean updateValue(T newValue) {
        //如果两个值一样,则返回false,不修改
        if (isEqual(value, newValue)) {
            return false;
        }
        value = newValue;
        //通知各个观察者
        for (PropertyListener<T> listener : listeners) {
            listener.configUpdate(newValue);
        }
        return true;
    }

    //判断两个对象是否一致
    private boolean isEqual(T oldValue, T newValue) {
        if (oldValue == null && newValue == null) {
            return true;
        }
        if (oldValue == null) {
            return false;
        }
        return oldValue.equals(newValue);
    }
    public void close() {
        listeners.clear();
    }
}
复制代码

这里的观察者为**SystemPropertyListener**,在SystemRuleManager的静态方法区已经添加进去。

public class SystemRuleManager {
    //观察者,当systemRule配置发生变更时,会通知该Listener
    private final static SystemPropertyListener listener = new SystemPropertyListener();
    static {
        checkSystemStatus.set(false);
        statusListener = new SystemStatusListener();
        scheduler.scheduleAtFixedRate(statusListener, 5, 1, TimeUnit.SECONDS);
        //添加观察者
        currentProperty.addListener(listener);
    }
}
复制代码

而在SystemPropertyListener更新的时候,会先关闭系统检查,当配置修改完成之后,再启用。

 static class SystemPropertyListener extends SimplePropertyListener<List<SystemRule>> {
    @Override
    public void configUpdate(List<SystemRule> rules) {
        //恢复到默认状态
        restoreSetting();
        if (rules != null && rules.size() >= 1) {
            for (SystemRule rule : rules) {
                //加载配置
                loadSystemConf(rule);
            }
        } else {
            checkSystemStatus.set(false);
        }
    }
    //重置配置信息
    protected void restoreSetting() {
        checkSystemStatus.set(false);
        // should restore changes
        highestSystemLoad = Double.MAX_VALUE;
        highestCpuUsage = Double.MAX_VALUE;
        maxRt = Long.MAX_VALUE;
        maxThread = Long.MAX_VALUE;
        qps = Double.MAX_VALUE;
        highestSystemLoadIsSet = false;
        maxRtIsSet = false;
        maxThreadIsSet = false;
        qpsIsSet = false;
    }
}
复制代码

修改则判断是否小于默认配置,由于之前已经重新初始化过了,所以如果有修改,肯定会比默认的值小。

public static void loadSystemConf(SystemRule rule) {
    boolean checkStatus = false;
    if (rule.getHighestSystemLoad() >= 0) {
        highestSystemLoad = Math.min(highestSystemLoad, rule.getHighestSystemLoad());
        highestSystemLoadIsSet = true;
        checkStatus = true;
    }
    if (rule.getHighestCpuUsage() >= 0) {
        highestCpuUsage = Math.min(highestCpuUsage, rule.getHighestCpuUsage());
        highestCpuUsageIsSet = true;
        checkStatus = true;
    }
    if (rule.getAvgRt() >= 0) {
        maxRt = Math.min(maxRt, rule.getAvgRt());
        maxRtIsSet = true;
        checkStatus = true;
    }
    if (rule.getMaxThread() >= 0) {
        maxThread = Math.min(maxThread, rule.getMaxThread());
        maxThreadIsSet = true;
        checkStatus = true;
    }
    if (rule.getQps() >= 0) {
        qps = Math.min(qps, rule.getQps());
        qpsIsSet = true;
        checkStatus = true;
    }
    checkSystemStatus.set(checkStatus);
}
复制代码

配置加载完毕,如果有SystemRule配置,则将checkSystemStatus改为true

1.6 总结

经过以上源码分析,我们可以得出几点结论:

  • Sentinel自适应限流原理采用了BBR算法
  • Sentinel自适应限流维度是整个系统,当系统负载过大时会触发限流。
  • Sentinel自适应限流信息采集指标为1秒钟一次。
  • The indicators collected by Sentinel are at the operating system level, so if you want to use adaptive current limiting, it is recommended that the application server not deploy other services with high load.

2. AuthoritySlot

AuthorizationSlotAccording to the black and white list, the black and white list control is performed; if the resource is configured AuthorityRule, it is judged according to the policy whether the request origin (origin) of the resource request is in the configuration rule LimitApp( (,)隔开) and the policy judgment, whether the check is passed.

  • if whitelisted

    • Determine whether the origin is in limitApp, if it is, return true, otherwise return false
  • If blacklisted

    • Determine whether the origin is in limitApp, if it is, return false, otherwise return true
public class AuthoritySlot extends AbstractLinkedProcessorSlot<DefaultNode> {
    @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count, boolean prioritized, Object... args)
        throws Throwable {
        //检查黑白名单
        checkBlackWhiteAuthority(resourceWrapper, context);
        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

    @Override
    public void exit(Context context, ResourceWrapper resourceWrapper, int count, Object... args) {
        fireExit(context, resourceWrapper, count, args);
    }

    void checkBlackWhiteAuthority(ResourceWrapper resource, Context context) throws AuthorityException {
        //获取认证的规则
        Map<String, List<AuthorityRule>> authorityRules = AuthorityRuleManager.getAuthorityRules();
        if (authorityRules == null) {
            return;
        }
        //根据resourceName获取该资源下对应的规则
        List<AuthorityRule> rules = authorityRules.get(resource.getName());
        if (rules == null) {
            return;
        }
        for (AuthorityRule rule : rules) {
            //认证检查
            if (!AuthorityRuleChecker.passCheck(rule, context)) {
                throw new AuthorityException(context.getOrigin(), rule);
            }
        }
    }
}
复制代码

The check logic is in AuthorityRuleChecker:

final class AuthorityRuleChecker {

    static boolean passCheck(AuthorityRule rule, Context context) {

        String requester = context.getOrigin();
        // 获取orgin请求来源,如果为请求来源为null或者limitApp为null则直接返回通过
        if (StringUtil.isEmpty(requester) || StringUtil.isEmpty(rule.getLimitApp())) {
            return true;
        }

        //判断limitApp是否含有origin
        int pos = rule.getLimitApp().indexOf(requester);
        boolean contain = pos > -1;
        if (contain) {
            boolean exactlyMatch = false;
            String[] appArray = rule.getLimitApp().split(",");
            for (String app : appArray) {
                if (requester.equals(app)) {
                    exactlyMatch = true;
                    break;
                }
            }

            contain = exactlyMatch;
        }
        //根据策略处理是否包含,判断是否通过
        int strategy = rule.getStrategy();
        if (strategy == RuleConstant.AUTHORITY_BLACK && contain) {
            return false;
        }

        if (strategy == RuleConstant.AUTHORITY_WHITE && !contain) {
            return false;
        }
        return true;
    }

    private AuthorityRuleChecker() {}
}
复制代码

AuthorityRuleThe config update is the same as the SystemSlotupdate-dependent method.AuthorityRuleManagerloadRules

2.1 Sentinel background configuration

image.png

In the source code of Sentinel version 1.8, since orion is an empty string by default when Context is created, if we want to verify permission control, we need to change this to a specific orion source. The code is as follows: com.alibaba.csp.sentinel.CtSph.InternalContextUtil#internalEnter(java.lang.String):image.png

Reference article

Sentinel1.8.5 source code github address (note)
Sentinel source code analysis
Sentinel official website
Setinel source code reading:
The principle of system adaptive protection in simple language AuthoritySlot

Guess you like

Origin juejin.im/post/7150074488195907615