Article Directory
foreword
- This article is learned from the dark horse micro-service part. Since the reference document is very detailed, I will directly use the original document to share! ! !
- This article is only for learning and communication. If there is any infringement, please contact the author in time to deal with it!
Sentinel source code analysis
1. The basic concept of Sentinel
Sentinel implements functions such as current limiting, isolation, degradation, and fusing. Essentially, there are two things to do:
- Statistical data: Statistical access data of a resource (QPS, RT, etc.)
- Rule Judgment: Judging whether the current limiting rules, isolation rules, downgrading rules, and fuse rules are satisfied
The resource here is the business that wants to be protected by Sentinel. For example, the controller method defined in the project is the resource protected by Sentinel by default.
1.1.ProcessorSlotChain
The core skeleton to achieve the above functions is a class called ProcessorSlotChain. This class is designed based on the chain of responsibility model, which encapsulates different functions (current limiting, downgrading, system protection) into slots one by one, and executes them one by one after the request enters.
Its workflow is shown in the figure:
Slots in the chain of responsibility are also divided into two categories:
- Statistical data construction part (statistic)
- NodeSelectorSlot: responsible for constructing the nodes (DefaultNode) in the cluster point link, and forming these nodes into a link tree
- ClusterBuilderSlot: ClusterNode responsible for building a certain resource. ClusterNode can save resource operation information (response time, QPS, block number, thread number, exception number, etc.) and source information (origin name)
- StatisticSlot: Responsible for statistics of real-time call data, including running information, source information, etc.
- Rule checking part (rule checking)
- AuthoritySlot: responsible for authorization rules (source control)
- SystemSlot: responsible for system protection rules
- ParamFlowSlot: Responsible for hotspot parameter current limiting rules
- FlowSlot: responsible for flow limiting rules
- DegradeSlot: responsible for degrading rules
1.2.Node
The cluster point link in Sentinel is composed of Nodes one by one. Node is an interface, including the following implementations:
All nodes can record access statistics to resources, so they are all subclasses of StatisticNode.
According to the function, it is divided into two types of Node:
- DefaultNode: Represents each resource in the link tree. When a resource appears in different links, different DefaultNode nodes will be created. The entry node of the tree is called EntranceNode, which is a special DefaultNode
- ClusterNode: Represents resources. No matter how many links a resource appears in, there will only be one ClusterNode. What is recorded is the sum of all statistical data that the current resource is accessed.
DefaultNode records the access data of resources in the current link, which is used to implement the current limiting rules based on the link mode. ClusterNode records the access data of resources in all links, and realizes the current limiting rules of default mode and association mode.
For example: We have two businesses in a SpringMVC project:
- Business 1: The resources in the controller
/order/query
access the resources in the service/goods
- Business 2: The resources in the controller
/order/save
access the resources in the service/goods
The link graph created is as follows:
1.3.Entry
By default, Sentinel will use the method in the controller as a protected resource, so the question is, how do we mark a piece of code as a Sentinel resource?
The resources in Sentinel are represented by Entry. API example for declaring Entry:
// 资源名可使用任意有业务语义的字符串,比如方法名、接口名或其它可唯一标识的字符串。
try (Entry entry = SphU.entry("resourceName")) {
// 被保护的业务逻辑
// do something here...
} catch (BlockException ex) {
// 资源访问阻止,被限流或被降级
// 在此处进行相应的处理操作
}
1.3.1. Custom resources
OrderService
For example, we mark the method as a resource in the order-service service queryOrderById()
.
1) First introduce sentinel dependency in order-service
<!--sentinel-->
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-sentinel</artifactId>
</dependency>
2) Then configure the Sentinel address
spring:
cloud:
sentinel:
transport:
dashboard: localhost:8089 # 这里我的sentinel用了8089的端口
3) Modify the queryOrderById method of the OrderService class
The code is implemented like this:
public Order queryOrderById(Long orderId) {
// 创建Entry,标记资源,资源名为resource1
try (Entry entry = SphU.entry("resource1")) {
// 1.查询订单,这里是假数据
Order order = Order.build(101L, 4999L, "小米 MIX4", 1, 1L, null);
// 2.查询用户,基于Feign的远程调用
User user = userClient.findById(order.getUserId());
// 3.设置
order.setUser(user);
// 4.返回
return order;
}catch (BlockException e){
log.error("被限流或降级", e);
return null;
}
}
4) Access
Open the browser and visit the order service: http://localhost:8080/order/101
Then open the sentinel console to view the cluster link:
1.3.2. Marking resources based on annotations
When learning Sentinel before, we know that we can mark resources by adding @SentinelResource annotation to the method.
How is this achieved?
Let's take a look at the Sentinel dependency packages we introduced:
The spring.factories statement needs to be the configuration class for automatic assembly, as follows:
Let's look at SentinelAutoConfiguration
this class:
As you can see, a Bean is declared here, SentinelResourceAspect
:
/**
* Aspect for methods with {@link SentinelResource} annotation.
*
* @author Eric Zhao
*/
@Aspect
public class SentinelResourceAspect extends AbstractSentinelAspectSupport {
// 切点是添加了 @SentinelResource注解的类
@Pointcut("@annotation(com.alibaba.csp.sentinel.annotation.SentinelResource)")
public void sentinelResourceAnnotationPointcut() {
}
// 环绕增强
@Around("sentinelResourceAnnotationPointcut()")
public Object invokeResourceWithSentinel(ProceedingJoinPoint pjp) throws Throwable {
// 获取受保护的方法
Method originMethod = resolveMethod(pjp);
// 获取 @SentinelResource注解
SentinelResource annotation = originMethod.getAnnotation(SentinelResource.class);
if (annotation == null) {
// Should not go through here.
throw new IllegalStateException("Wrong state for SentinelResource annotation");
}
// 获取注解上的资源名称
String resourceName = getResourceName(annotation.value(), originMethod);
EntryType entryType = annotation.entryType();
int resourceType = annotation.resourceType();
Entry entry = null;
try {
// 创建资源 Entry
entry = SphU.entry(resourceName, resourceType, entryType, pjp.getArgs());
// 执行受保护的方法
Object result = pjp.proceed();
return result;
} catch (BlockException ex) {
return handleBlockException(pjp, annotation, ex);
} catch (Throwable ex) {
Class<? extends Throwable>[] exceptionsToIgnore = annotation.exceptionsToIgnore();
// The ignore list will be checked first.
if (exceptionsToIgnore.length > 0 && exceptionBelongsTo(ex, exceptionsToIgnore)) {
throw ex;
}
if (exceptionBelongsTo(ex, annotation.exceptionsToTrace())) {
traceException(ex);
return handleFallback(pjp, annotation, ex);
}
// No fallback function can handle the exception, so throw it out.
throw ex;
} finally {
if (entry != null) {
entry.exit(1, pjp.getArgs());
}
}
}
}
To put it simply, the @SentinelResource annotation is a marker, and Sentinel, based on the AOP idea, surrounds and enhances the marked method to complete the Entry
creation of resources ( ).
1.4.Context
In the previous section, we found that in addition to the two resources of the controller method and the service method in the cluster link, there is also a default entry node:
sentinel_spring_web_context is a node of type EntranceNode
This node is created by Sentinel for us when initializing Context.
1.4.1. What is Context
So, what is Context?
- Context represents the call link context, runs through all resources in a call link (
Entry
), and is based on ThreadLocal. - Context maintains information such as the entry node (
entranceNode
), the curNode (current resource node) of this call link, and the call source ( ).origin
- Subsequent Slots can obtain DefaultNode or ClusterNode through Context, so as to obtain statistical data and complete rule judgment
- During the initialization process of Context, EntranceNode will be created, and contextName is the name of EntranceNode
The corresponding APIs are as follows:
// 创建context,包含两个参数:context名称、 来源名称
ContextUtil.enter("contextName", "originName");
1.4.2. Context initialization
So when is this Context initialized?
1.4.2.1. Autowiring
Let's take a look at the Sentinel dependency packages we introduced:
The spring.factories statement needs to be the configuration class for automatic assembly, as follows:
Let's first look at the SentinelWebAutoConfiguration class:
This class implements WebMvcConfigurer, we know that this is the class used in SpringMVC custom configuration, you can configure HandlerInterceptor:
SentinelWebInterceptor
You can see that an interceptor is configured here .
SentinelWebInterceptor
The statement is as follows:
It was found that it inherited AbstractSentinelInterceptor
this class.
HandlerInterceptor
The interceptor will intercept all methods entering the controller and execute preHandle
the pre-intercept method, and the initialization of the Context is completed here.
1.4.2.2.AbstractSentinelInterceptor
HandlerInterceptor
The interceptor will intercept all methods entering the controller and execute preHandle
the pre-intercept method, and the initialization of the Context is completed here.
Let's take a look at the implementation of this class preHandle
:
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)
throws Exception {
try {
// 获取资源名称,一般是controller方法的@RequestMapping路径,例如/order/{orderId}
String resourceName = getResourceName(request);
if (StringUtil.isEmpty(resourceName)) {
return true;
}
// 从request中获取请求来源,将来做 授权规则 判断时会用
String origin = parseOrigin(request);
// 获取 contextName,默认是sentinel_spring_web_context
String contextName = getContextName(request);
// 创建 Context
ContextUtil.enter(contextName, origin);
// 创建资源,名称就是当前请求的controller方法的映射路径
Entry entry = SphU.entry(resourceName, ResourceTypeConstants.COMMON_WEB, EntryType.IN);
request.setAttribute(baseWebMvcConfig.getRequestAttributeName(), entry);
return true;
} catch (BlockException e) {
try {
handleBlockException(request, response, e);
} finally {
ContextUtil.exit();
}
return false;
}
}
1.4.2.3.ContextUtil
The way to create Context is ContextUtil.enter(contextName, origin);
We enter the method:
public static Context enter(String name, String origin) {
if (Constants.CONTEXT_DEFAULT_NAME.equals(name)) {
throw new ContextNameDefineException(
"The " + Constants.CONTEXT_DEFAULT_NAME + " can't be permit to defined!");
}
return trueEnter(name, origin);
}
How to enter trueEnter
:
protected static Context trueEnter(String name, String origin) {
// 尝试获取context
Context context = contextHolder.get();
// 判空
if (context == null) {
// 如果为空,开始初始化
Map<String, DefaultNode> localCacheNameMap = contextNameNodeMap;
// 尝试获取入口节点
DefaultNode node = localCacheNameMap.get(name);
if (node == null) {
LOCK.lock();
try {
node = contextNameNodeMap.get(name);
if (node == null) {
// 入口节点为空,初始化入口节点 EntranceNode
node = new EntranceNode(new StringResourceWrapper(name, EntryType.IN), null);
// 添加入口节点到 ROOT
Constants.ROOT.addChild(node);
// 将入口节点放入缓存
Map<String, DefaultNode> newMap = new HashMap<>(contextNameNodeMap.size() + 1);
newMap.putAll(contextNameNodeMap);
newMap.put(name, node);
contextNameNodeMap = newMap;
}
} finally {
LOCK.unlock();
}
}
// 创建Context,参数为:入口节点 和 contextName
context = new Context(node, name);
// 设置请求来源 origin
context.setOrigin(origin);
// 放入ThreadLocal
contextHolder.set(context);
}
// 返回
return context;
}
2. ProcessorSlotChain execution process
Next, we track the source code and verify the execution process of ProcessorSlotChain.
2.1. Entrance
First, back to the entry point of everything, the method AbstractSentinelInterceptor
of the class preHandle
:
Also, SentinelResourceAspect
the surround enhancement method:
As you can see, any resource must execute SphU.entry()
this method:
public static Entry entry(String name, int resourceType, EntryType trafficType, Object[] args)
throws BlockException {
return Env.sph.entryWithType(name, resourceType, trafficType, 1, args);
}
Continue to enter Env.sph.entryWithType(name, resourceType, trafficType, 1, args);
:
@Override
public Entry entryWithType(String name, int resourceType, EntryType entryType, int count, boolean prioritized,
Object[] args) throws BlockException {
// 将 资源名称等基本信息 封装为一个 StringResourceWrapper对象
StringResourceWrapper resource = new StringResourceWrapper(name, entryType, resourceType);
// 继续
return entryWithPriority(resource, count, prioritized, args);
}
How to enter entryWithPriority
:
private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... args)
throws BlockException {
// 获取 Context
Context context = ContextUtil.getContext();
if (context == null) {
// Using default context.
context = InternalContextUtil.internalEnter(Constants.CONTEXT_DEFAULT_NAME);
}
、 // 获取 Slot执行链,同一个资源,会创建一个执行链,放入缓存
ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);
// 创建 Entry,并将 resource、chain、context 记录在 Entry中
Entry e = new CtEntry(resourceWrapper, chain, context);
try {
// 执行 slotChain
chain.entry(context, resourceWrapper, null, count, prioritized, args);
} catch (BlockException e1) {
e.exit(count, args);
throw e1;
} catch (Throwable e1) {
// This should not happen, unless there are errors existing in Sentinel internal.
RecordLog.info("Sentinel unexpected exception", e1);
}
return e;
}
In this code, ProcessorSlotChain
the object will be obtained, and then each Slot in the slotChain will be executed based on chain.entry(). And here is its implementation class: DefaultProcessorSlotChain.
After obtaining the ProcessorSlotChain, it will be saved in a Map, the key is ResourceWrapper, and the value is ProcessorSlotChain.
Therefore, a resource will only have one ProcessorSlotChain .
2.2.DefaultProcessorSlotChain
We enter the entry method of DefaultProcessorSlotChain:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object t, int count, boolean prioritized, Object... args)
throws Throwable {
// first,就是责任链中的第一个 slot
first.transformEntry(context, resourceWrapper, t, count, prioritized, args);
}
Here first, the type is AbstractLinkedProcessorSlot:
Look at the inheritance relationship:
Therefore, first must be one of these implementation classes. According to the sequence of responsibility chain mentioned earlier, first should be NodeSelectorSlot
.
However, since it is based on the chain of responsibility model, you only need to remember the next slot here, which is next:
next is indeed of type NodeSelectSlot.
And the next of NodeSelectSlot must be ClusterBuilderSlot, and so on:
A chain of responsibility is established.
2.3.NodeSelectorSlot
NodeSelectorSlot is responsible for constructing the nodes (DefaultNode) in the cluster point link and forming these nodes into a link tree.
Core code:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
// 尝试获取 当前资源的 DefaultNode
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
// 如果为空,为当前资源创建一个新的 DefaultNode
node = new DefaultNode(resourceWrapper, null);
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
// 放入缓存中,注意这里的 key是contextName,
// 这样不同链路进入相同资源,就会创建多个 DefaultNode
cacheMap.put(context.getName(), node);
map = cacheMap;
// 当前节点加入上一节点的 child中,这样就构成了调用链路树
((DefaultNode) context.getLastNode()).addChild(node);
}
}
}
// context中的curNode(当前节点)设置为新的 node
context.setCurNode(node);
// 执行下一个 slot
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
This Slot accomplishes several things:
- Create a DefaultNode for the current resource
- Put the DefaultNode into the cache, and the key is contextName, so that requests for different link entries will create multiple DefaultNodes, and only one DefaultNode for the same link
- Set the DefaultNode of the current resource to the childNode of the previous resource
- Set the DefaultNode of the current resource to curNode (the current node) in the Context
The next slot is ClusterBuilderSlot
2.4.ClusterBuilderSlot
ClusterBuilderSlot is responsible for building a ClusterNode of a resource, the core code:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node,
int count, boolean prioritized, Object... args)
throws Throwable {
// 判空,注意ClusterNode是共享的成员变量,也就是说一个资源只有一个ClusterNode,与链路无关
if (clusterNode == null) {
synchronized (lock) {
if (clusterNode == null) {
// 创建 cluster node.
clusterNode = new ClusterNode(resourceWrapper.getName(), resourceWrapper.getResourceType());
HashMap<ResourceWrapper, ClusterNode> newMap = new HashMap<>(Math.max(clusterNodeMap.size(), 16));
newMap.putAll(clusterNodeMap);
// 放入缓存,可以是nodeId,也就是resource名称
newMap.put(node.getId(), clusterNode);
clusterNodeMap = newMap;
}
}
}
// 将资源的 DefaultNode与 ClusterNode关联
node.setClusterNode(clusterNode);
// 记录请求来源 origin 将 origin放入 entry
if (!"".equals(context.getOrigin())) {
Node originNode = node.getClusterNode().getOrCreateOriginNode(context.getOrigin());
context.getCurEntry().setOriginNode(originNode);
}
// 继续下一个slot
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
2.5.StatisticSlot
StatisticSlot is responsible for counting real-time call data, including running information (number of visits, number of threads), source information, etc.
StatisticSlot is the key to current limiting, in which a counter is maintained based on the sliding time window algorithm to count the number of requests entering a certain resource.
Core code:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node,
int count, boolean prioritized, Object... args) throws Throwable {
try {
// 放行到下一个 slot,做限流、降级等判断
fireEntry(context, resourceWrapper, node, count, prioritized, args);
// 请求通过了, 线程计数器 +1 ,用作线程隔离
node.increaseThreadNum();
// 请求计数器 +1 用作限流
node.addPassRequest(count);
if (context.getCurEntry().getOriginNode() != null) {
// 如果有 origin,来源计数器也都要 +1
context.getCurEntry().getOriginNode().increaseThreadNum();
context.getCurEntry().getOriginNode().addPassRequest(count);
}
if (resourceWrapper.getEntryType() == EntryType.IN) {
// 如果是入口资源,还要给全局计数器 +1.
Constants.ENTRY_NODE.increaseThreadNum();
Constants.ENTRY_NODE.addPassRequest(count);
}
// 请求通过后的回调.
for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
handler.onPass(context, resourceWrapper, node, count, args);
}
} catch (Throwable e) {
// 各种异常处理就省略了。。。
context.getCurEntry().setError(e);
throw e;
}
}
In addition, it should be noted that all count +1 actions include two parts, as node.addPassRequest(count);
an example:
@Override
public void addPassRequest(int count) {
// DefaultNode的计数器,代表当前链路的 计数器
super.addPassRequest(count);
// ClusterNode计数器,代表当前资源的 总计数器
this.clusterNode.addPassRequest(count);
}
The specific counting method will be seen later.
Next, enter the relevant slots for rule verification, in order:
- AuthoritySlot: responsible for authorization rules (source control)
- SystemSlot: responsible for system protection rules
- ParamFlowSlot: Responsible for hotspot parameter current limiting rules
- FlowSlot: responsible for flow limiting rules
- DegradeSlot: responsible for degrading rules
2.6.AuthoritySlot
Responsible for judging the authorization rules of the origin of the request, as shown in the figure:
Core APIs:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count, boolean prioritized, Object... args)
throws Throwable {
// 校验黑白名单
checkBlackWhiteAuthority(resourceWrapper, context);
// 进入下一个 slot
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
The logic of black and white list verification:
void checkBlackWhiteAuthority(ResourceWrapper resource, Context context) throws AuthorityException {
// 获取授权规则
Map<String, Set<AuthorityRule>> authorityRules = AuthorityRuleManager.getAuthorityRules();
if (authorityRules == null) {
return;
}
Set<AuthorityRule> rules = authorityRules.get(resource.getName());
if (rules == null) {
return;
}
// 遍历规则并判断
for (AuthorityRule rule : rules) {
if (!AuthorityRuleChecker.passCheck(rule, context)) {
// 规则不通过,直接抛出异常
throw new AuthorityException(context.getOrigin(), rule);
}
}
}
Look at the method again AuthorityRuleChecker.passCheck(rule, context)
:
static boolean passCheck(AuthorityRule rule, Context context) {
// 得到请求来源 origin
String requester = context.getOrigin();
// 来源为空,或者规则为空,都直接放行
if (StringUtil.isEmpty(requester) || StringUtil.isEmpty(rule.getLimitApp())) {
return true;
}
// rule.getLimitApp()得到的就是 白名单 或 黑名单 的字符串,这里先用 indexOf方法判断
int pos = rule.getLimitApp().indexOf(requester);
boolean contain = pos > -1;
if (contain) {
// 如果包含 origin,还要进一步做精确判断,把名单列表以","分割,逐个判断
boolean exactlyMatch = false;
String[] appArray = rule.getLimitApp().split(",");
for (String app : appArray) {
if (requester.equals(app)) {
exactlyMatch = true;
break;
}
}
contain = exactlyMatch;
}
// 如果是黑名单,并且包含origin,则返回false
int strategy = rule.getStrategy();
if (strategy == RuleConstant.AUTHORITY_BLACK && contain) {
return false;
}
// 如果是白名单,并且不包含origin,则返回false
if (strategy == RuleConstant.AUTHORITY_WHITE && !contain) {
return false;
}
// 其它情况返回true
return true;
}
2.7.SystemSlot
SystemSlot is a rule check for system protection:
Core APIs:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node,
int count,boolean prioritized, Object... args) throws Throwable {
// 系统规则校验
SystemRuleManager.checkSystem(resourceWrapper);
// 进入下一个 slot
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
Look at the following SystemRuleManager.checkSystem(resourceWrapper);
code:
public static void checkSystem(ResourceWrapper resourceWrapper) throws BlockException {
if (resourceWrapper == null) {
return;
}
// Ensure the checking switch is on.
if (!checkSystemStatus.get()) {
return;
}
// 只针对入口资源做校验,其它直接返回
if (resourceWrapper.getEntryType() != EntryType.IN) {
return;
}
// 全局 QPS校验
double currentQps = Constants.ENTRY_NODE == null ? 0.0 : Constants.ENTRY_NODE.successQps();
if (currentQps > qps) {
throw new SystemBlockException(resourceWrapper.getName(), "qps");
}
// 全局 线程数 校验
int currentThread = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.curThreadNum();
if (currentThread > maxThread) {
throw new SystemBlockException(resourceWrapper.getName(), "thread");
}
// 全局平均 RT校验
double rt = Constants.ENTRY_NODE == null ? 0 : Constants.ENTRY_NODE.avgRt();
if (rt > maxRt) {
throw new SystemBlockException(resourceWrapper.getName(), "rt");
}
// 全局 系统负载 校验
if (highestSystemLoadIsSet && getCurrentSystemAvgLoad() > highestSystemLoad) {
if (!checkBbr(currentThread)) {
throw new SystemBlockException(resourceWrapper.getName(), "load");
}
}
// 全局 CPU使用率 校验
if (highestCpuUsageIsSet && getCurrentCpuUsage() > highestCpuUsage) {
throw new SystemBlockException(resourceWrapper.getName(), "cpu");
}
}
2.8.ParamFlowSlot
ParamFlowSlot is the current limit of hotspot parameters, as shown in the figure:
It is a flow-limiting method of counting QPS for different request parameter values for incoming resource requests.
-
The stand-alone threshold here is the maximum number of tokens: maxCount
-
The statistical window duration here is the statistical duration: duration
The meaning is that maxCount tokens are produced at most every duration, and the configuration in the above figure means that 2 tokens are produced every 1 second.
Core APIs:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node,
int count, boolean prioritized, Object... args) throws Throwable {
// 如果没有设置热点规则,直接放行
if (!ParamFlowRuleManager.hasRules(resourceWrapper.getName())) {
fireEntry(context, resourceWrapper, node, count, prioritized, args);
return;
}
// 热点规则判断
checkFlow(resourceWrapper, count, args);
// 进入下一个 slot
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
2.8.1. Token Bucket
The hotspot rule judgment adopts the token bucket algorithm to realize the parameter current limit, and sets the token bucket for each different parameter value. The token bucket of Sentinel consists of two parts:
The keys of these two Maps are the requested parameter values, but the values are different. Among them:
- tokenCounters: used to record the number of remaining tokens
- timeCounters: used to record the time of the last request
When a request with parameters arrives, the basic judgment process is as follows:
2.9.FlowSlot
FlowSlot is responsible for the judgment of current limiting rules, as shown in the figure:
including:
- Three flow control modes: direct mode, association mode, link mode
- Three flow control effects: fast fail, warm up, queue waiting
The three flow control modes are divided into two categories from the perspective of underlying data statistics :
- Perform current limiting statistics on all requests (ClusterNode) entering resources: direct mode, association mode
- Perform current limiting statistics on some links (DefaultNode) entering resources: link mode
The three flow control effects can be divided into two categories from the perspective of current limiting algorithm :
- Sliding time window algorithm: fail fast, warm up
- Leaky Bucket Algorithm: Waiting in line for effect
2.9.1. Core process
The core APIs are as follows:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
boolean prioritized, Object... args) throws Throwable {
// 限流规则检测
checkFlow(resourceWrapper, context, node, count, prioritized);
// 放行
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
checkFlow method:
void checkFlow(ResourceWrapper resource, Context context, DefaultNode node, int count, boolean prioritized)
throws BlockException {
// checker是 FlowRuleChecker 类的一个对象
checker.checkFlow(ruleProvider, resource, context, node, count, prioritized);
}
Installed FlowRuleChecker:
public void checkFlow(Function<String, Collection<FlowRule>> ruleProvider,
ResourceWrapper resource,Context context, DefaultNode node,
int count, boolean prioritized) throws BlockException {
if (ruleProvider == null || resource == null) {
return;
}
// 获取当前资源的所有限流规则
Collection<FlowRule> rules = ruleProvider.apply(resource.getName());
if (rules != null) {
for (FlowRule rule : rules) {
// 遍历,逐个规则做校验
if (!canPassCheck(rule, context, node, count, prioritized)) {
throw new FlowException(rule.getLimitApp(), rule);
}
}
}
}
The FlowRule here is the flow-limiting rule interface, and several member variables in it just correspond to the form parameters:
public class FlowRule extends AbstractRule {
/**
* 阈值类型 (0: 线程, 1: QPS).
*/
private int grade = RuleConstant.FLOW_GRADE_QPS;
/**
* 阈值.
*/
private double count;
/**
* 三种限流模式.
*
* {@link RuleConstant#STRATEGY_DIRECT} 直连模式;
* {@link RuleConstant#STRATEGY_RELATE} 关联模式;
* {@link RuleConstant#STRATEGY_CHAIN} 链路模式.
*/
private int strategy = RuleConstant.STRATEGY_DIRECT;
/**
* 关联模式关联的资源名称.
*/
private String refResource;
/**
* 3种流控效果.
* 0. 快速失败, 1. warm up, 2. 排队等待, 3. warm up + 排队等待
*/
private int controlBehavior = RuleConstant.CONTROL_BEHAVIOR_DEFAULT;
// 预热时长
private int warmUpPeriodSec = 10;
/**
* 队列最大等待时间.
*/
private int maxQueueingTimeMs = 500;
// 。。。 略
}
The validation logic is defined FlowRuleChecker
in canPassCheck
the method:
public boolean canPassCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node, int acquireCount,
boolean prioritized) {
// 获取限流资源名称
String limitApp = rule.getLimitApp();
if (limitApp == null) {
return true;
}
// 校验规则
return passLocalCheck(rule, context, node, acquireCount, prioritized);
}
Enter passLocalCheck()
:
private static boolean passLocalCheck(FlowRule rule, Context context, DefaultNode node,
int acquireCount, boolean prioritized) {
// 基于限流模式判断要统计的节点,
// 如果是直连模式,关联模式,对ClusterNode统计,如果是链路模式,则对DefaultNode统计
Node selectedNode = selectNodeByRequesterAndStrategy(rule, context, node);
if (selectedNode == null) {
return true;
}
// 判断规则
return rule.getRater().canPass(selectedNode, acquireCount, prioritized);
}
The judgment of the rules here must first FlowRule#getRater()
obtain the flow controller TrafficShapingController
, and then do the flow limit.
And TrafficShapingController
there are 3 implementations:
- DefaultController: fail fast, the default method, based on the sliding time window algorithm
- WarmUpController: Warm-up mode, based on the sliding time window algorithm, but the threshold is dynamic
- RateLimiterController: queue waiting mode, based on leaky bucket algorithm
The final current limiting judgment is in the canPass method of TrafficShapingController.
2.9.2. Sliding time window
The function of the sliding time window is divided into two parts:
- One is the QPS counting function of the time interval window, which is called in StatisticSlot
- The second is to accumulate the time interval window QPS in the sliding window, which is called in FlowRule
Let's first look at the QPS counting function of the time interval window.
2.9.2.1. Time window request volume statistics
Looking back at the StatisticSlot section in Chapter 2.5, there is such a piece of code:
It is counting the QPS passing through the node, let's follow up and see, here is the inside of the DefaultNode:
It is found that QPS statistics are being made on DefaultNode
and at the same time ClusterNode
. We know that DefaultNode
and ClusterNode
are StatisticNode
subclasses. Calling addPassRequest()
methods here will eventually enter StatisticNode
in.
Follow any one:
There are two latitude statistics of seconds and minutes, corresponding to two counters. Find the corresponding member variable, you can see:
Both counters are of type ArrayMetric and are passed in two parameters:
// intervalInMs:是滑动窗口的时间间隔,默认为 1 秒
// sampleCount: 时间窗口的分隔数量,默认为 2,就是把 1秒分为 2个小时间窗
public ArrayMetric(int sampleCount, int intervalInMs) {
this.data = new OccupiableBucketLeapArray(sampleCount, intervalInMs);
}
As shown in the picture:
Next, we enter the methods ArrayMetric
of the class addPass
:
@Override
public void addPass(int count) {
// 获取当前时间所在的时间窗
WindowWrap<MetricBucket> wrap = data.currentWindow();
// 计数器 +1
wrap.value().addPass(count);
}
So, how does the counter know which window it is currently in?
The data here is a LeapArray:
Four properties of LeapArray:
public abstract class LeapArray<T> {
// 小窗口的时间长度,默认是500ms ,值 = intervalInMs / sampleCount
protected int windowLengthInMs;
// 滑动窗口内的 小窗口 数量,默认为 2
protected int sampleCount;
// 滑动窗口的时间间隔,默认为 1000ms
protected int intervalInMs;
// 滑动窗口的时间间隔,单位为秒,默认为 1
private double intervalInSecond;
}
LeapArray is a circular array. Because time is infinite, the length of the array cannot be infinite. Therefore, each grid in the array is placed in a time window (window). When the array is full, the corner mark returns to 0, covering the original window.
Because the sliding window is divided into small windows of sampleCount at most, so as long as the array length is greater than sampleCount, the two small windows in the nearest sliding window will never be covered, so there is no need to worry about the problem of old data being covered.
How we follow up data.currentWindow();
:
public WindowWrap<T> currentWindow(long timeMillis) {
if (timeMillis < 0) {
return null;
}
// 计算当前时间对应的数组角标
int idx = calculateTimeIdx(timeMillis);
// 计算当前时间所在窗口的开始时间.
long windowStart = calculateWindowStart(timeMillis);
/*
* 先根据角标获取数组中保存的 oldWindow 对象,可能是旧数据,需要判断.
*
* (1) oldWindow 不存在, 说明是第一次,创建新 window并存入,然后返回即可
* (2) oldWindow的 starTime = 本次请求的 windowStar, 说明正是要找的窗口,直接返回.
* (3) oldWindow的 starTime < 本次请求的 windowStar, 说明是旧数据,需要被覆盖,创建
* 新窗口,覆盖旧窗口
*/
while (true) {
WindowWrap<T> old = array.get(idx);
if (old == null) {
// 创建新 window
WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
// 基于CAS写入数组,避免线程安全问题
if (array.compareAndSet(idx, null, window)) {
// 写入成功,返回新的 window
return window;
} else {
// 写入失败,说明有并发更新,等待其它人更新完成即可
Thread.yield();
}
} else if (windowStart == old.windowStart()) {
return old;
} else if (windowStart > old.windowStart()) {
if (updateLock.tryLock()) {
try {
// 获取并发锁,覆盖旧窗口并返回
return resetWindowTo(old, windowStart);
} finally {
updateLock.unlock();
}
} else {
// 获取锁失败,等待其它线程处理就可以了
Thread.yield();
}
} else if (windowStart < old.windowStart()) {
// 这种情况不应该存在,写这里只是以防万一。
return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
}
}
}
After finding the window (WindowWrap) where the current time is located, just call the add method in the WindowWrap object, and the counter will be +1.
It is only responsible for counting the request volume of each window, not for interception. Current limiting interception depends on the logic in FlowSlot.
2.9.2.2. Sliding window QPS calculation
As we mentioned in section 2.9.1, FlowSlot's current limit judgment is ultimately implemented by the methods TrafficShapingController
in the interface . canPass
This interface has three implementation classes:
- DefaultController: fail fast, the default method, based on the sliding time window algorithm
- WarmUpController: Warm-up mode, based on the sliding time window algorithm, but the threshold is dynamic
- RateLimiterController: queue waiting mode, based on leaky bucket algorithm
Therefore, we follow the canPass method in the default DefaultController to analyze:
@Override
public boolean canPass(Node node, int acquireCount, boolean prioritized) {
// 计算目前为止滑动窗口内已经存在的请求量
int curCount = avgUsedTokens(node);
// 判断:已使用请求量 + 需要的请求量(1) 是否大于 窗口的请求阈值
if (curCount + acquireCount > count) {
// 大于,说明超出阈值,返回false
if (prioritized && grade == RuleConstant.FLOW_GRADE_QPS) {
long currentTime;
long waitInMs;
currentTime = TimeUtil.currentTimeMillis();
waitInMs = node.tryOccupyNext(currentTime, acquireCount, count);
if (waitInMs < OccupyTimeoutProperty.getOccupyTimeout()) {
node.addWaitingRequest(currentTime + waitInMs, acquireCount);
node.addOccupiedPass(acquireCount);
sleep(waitInMs);
// PriorityWaitException indicates that the request will pass after waiting for {@link @waitInMs}.
throw new PriorityWaitException(waitInMs);
}
}
return false;
}
// 小于等于,说明在阈值范围内,返回true
return true;
}
Therefore, the key to judgingint curCount = avgUsedTokens(node);
private int avgUsedTokens(Node node) {
if (node == null) {
return DEFAULT_AVG_USED_TOKENS;
}
return grade == RuleConstant.FLOW_GRADE_THREAD ? node.curThreadNum() : (int)(node.passQps());
}
Because we use current limiting, follow node.passQps()
the logic:
// 这里又进入了 StatisticNode类
@Override
public double passQps() {
// 请求量 ÷ 滑动窗口时间间隔 ,得到的就是QPS
return rollingCounterInSecond.pass() / rollingCounterInSecond.getWindowIntervalInSec();
}
So rollingCounterInSecond.pass()
how do you get the requested amount?
// rollingCounterInSecond 本质是ArrayMetric,之前说过
@Override
public long pass() {
// 获取当前窗口
data.currentWindow();
long pass = 0;
// 获取 当前时间的 滑动窗口范围内 的所有小窗口
List<MetricBucket> list = data.values();
// 遍历
for (MetricBucket window : list) {
// 累加求和
pass += window.pass();
}
// 返回
return pass;
}
Let's see data.values()
how to get all the small windows within the range of the sliding window:
// 此处进入LeapArray类中:
public List<T> values(long timeMillis) {
if (timeMillis < 0) {
return new ArrayList<T>();
}
// 创建空集合,大小等于 LeapArray长度
int size = array.length();
List<T> result = new ArrayList<T>(size);
// 遍历 LeapArray
for (int i = 0; i < size; i++) {
// 获取每一个小窗口
WindowWrap<T> windowWrap = array.get(i);
// 判断这个小窗口是否在 滑动窗口时间范围内(1秒内)
if (windowWrap == null || isWindowDeprecated(timeMillis, windowWrap)) {
// 不在范围内,则跳过
continue;
}
// 在范围内,则添加到集合中
result.add(windowWrap.value());
}
// 返回集合
return result;
}
So, isWindowDeprecated(timeMillis, windowWrap)
how to judge whether the window meets the requirements?
public boolean isWindowDeprecated(long time, WindowWrap<T> windowWrap) {
// 当前时间 - 窗口开始时间 是否大于 滑动窗口的最大间隔(1秒)
// 也就是说,我们要统计的时 距离当前时间1秒内的 小窗口的 count之和
return time - windowWrap.windowStart() > intervalInMs;
}
2.9.3. Leaky bucket
As we mentioned in the previous section, FlowSlot's current limit judgment is ultimately implemented by the methods TrafficShapingController
in the interface . canPass
This interface has three implementation classes:
- DefaultController: fail fast, the default method, based on the sliding time window algorithm
- WarmUpController: Warm-up mode, based on the sliding time window algorithm, but the threshold is dynamic
- RateLimiterController: queue waiting mode, based on leaky bucket algorithm
Therefore, we follow the canPass method in the default RateLimiterController to analyze:
@Override
public boolean canPass(Node node, int acquireCount, boolean prioritized) {
// Pass when acquire count is less or equal than 0.
if (acquireCount <= 0) {
return true;
}
// 阈值小于等于 0 ,阻止请求
if (count <= 0) {
return false;
}
// 获取当前时间
long currentTime = TimeUtil.currentTimeMillis();
// 计算两次请求之间允许的最小时间间隔
long costTime = Math.round(1.0 * (acquireCount) / count * 1000);
// 计算本次请求 允许执行的时间点 = 最近一次请求的可执行时间 + 两次请求的最小间隔
long expectedTime = costTime + latestPassedTime.get();
// 如果允许执行的时间点小于当前时间,说明可以立即执行
if (expectedTime <= currentTime) {
// 更新上一次的请求的执行时间
latestPassedTime.set(currentTime);
return true;
} else {
// 不能立即执行,需要计算 预期等待时长
// 预期等待时长 = 两次请求的最小间隔 +最近一次请求的可执行时间 - 当前时间
long waitTime = costTime + latestPassedTime.get() - TimeUtil.currentTimeMillis();
// 如果预期等待时间超出阈值,则拒绝请求
if (waitTime > maxQueueingTimeMs) {
return false;
} else {
// 预期等待时间小于阈值,更新最近一次请求的可执行时间,加上costTime
long oldTime = latestPassedTime.addAndGet(costTime);
try {
// 保险起见,再判断一次预期等待时间,是否超过阈值
waitTime = oldTime - TimeUtil.currentTimeMillis();
if (waitTime > maxQueueingTimeMs) {
// 如果超过,则把刚才 加 的时间再 减回来
latestPassedTime.addAndGet(-costTime);
// 拒绝
return false;
}
// in race condition waitTime may <= 0
if (waitTime > 0) {
// 预期等待时间在阈值范围内,休眠要等待的时间,醒来后继续执行
Thread.sleep(waitTime);
}
return true;
} catch (InterruptedException e) {
}
}
}
return false;
}
It is basically the same as the leaky bucket algorithm we analyzed before:
2.10.DegradeSlot
The last hurdle is to judge the downgrade rules.
Sentinel's degradation is implemented based on a state machine:
The corresponding implementation is in the DegradeSlot class, the core API:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node,
int count, boolean prioritized, Object... args) throws Throwable {
// 熔断降级规则判断
performChecking(context, resourceWrapper);
// 继续下一个slot
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
Continue to enter performChecking
the method:
void performChecking(Context context, ResourceWrapper r) throws BlockException {
// 获取当前资源上的所有的断路器 CircuitBreaker
List<CircuitBreaker> circuitBreakers = DegradeRuleManager.getCircuitBreakers(r.getName());
if (circuitBreakers == null || circuitBreakers.isEmpty()) {
return;
}
for (CircuitBreaker cb : circuitBreakers) {
// 遍历断路器,逐个判断
if (!cb.tryPass(context)) {
throw new DegradeException(cb.getRule().getLimitApp(), cb.getRule());
}
}
}
2.10.1.CircuitBreaker
We enter the tryPass method of CircuitBreaker:
@Override
public boolean tryPass(Context context) {
// 判断状态机状态
if (currentState.get() == State.CLOSED) {
// 如果是closed状态,直接放行
return true;
}
if (currentState.get() == State.OPEN) {
// 如果是OPEN状态,断路器打开
// 继续判断OPEN时间窗是否结束,如果是则把状态从OPEN切换到 HALF_OPEN,返回true
return retryTimeoutArrived() && fromOpenToHalfOpen(context);
}
// OPEN状态,并且时间窗未到,返回false
return false;
}
The judgment about the time window is in retryTimeoutArrived()
the method:
protected boolean retryTimeoutArrived() {
// 当前时间 大于 下一次 HalfOpen的重试时间
return TimeUtil.currentTimeMillis() >= nextRetryTimestamp;
}
OPEN to HALF_OPEN switch in fromOpenToHalfOpen(context)
method:
protected boolean fromOpenToHalfOpen(Context context) {
// 基于CAS修改状态,从 OPEN到 HALF_OPEN
if (currentState.compareAndSet(State.OPEN, State.HALF_OPEN)) {
// 状态变更的事件通知
notifyObservers(State.OPEN, State.HALF_OPEN, null);
// 得到当前资源
Entry entry = context.getCurEntry();
// 给资源设置监听器,在资源Entry销毁时(资源业务执行完毕时)触发
entry.whenTerminate(new BiConsumer<Context, Entry>() {
@Override
public void accept(Context context, Entry entry) {
// 判断 资源业务是否异常
if (entry.getBlockError() != null) {
// 如果异常,则再次进入OPEN状态
currentState.compareAndSet(State.HALF_OPEN, State.OPEN);
notifyObservers(State.HALF_OPEN, State.OPEN, 1.0d);
}
}
});
return true;
}
return false;
}
There are changes from OPEN to HALF_OPEN, and from HALF_OPEN to OPEN, but there are still a few changes:
- From CLOSED to OPEN
- From HALF_OPEN to CLOSED
2.10.2. Triggering the circuit breaker
After the request passes through all slots, the exit method must be executed, and in the exit method of DegradeSlot:
The onRequestComplete method of CircuitBreaker will be called. And CircuitBreaker has two implementations:
Let's take the abnormal ratio fuse as an example to see how to ExceptionCircuitBreaker
enter onRequestComplete
:
@Override
public void onRequestComplete(Context context) {
// 获取资源 Entry
Entry entry = context.getCurEntry();
if (entry == null) {
return;
}
// 尝试获取 资源中的 异常
Throwable error = entry.getError();
// 获取计数器,同样采用了滑动窗口来计数
SimpleErrorCounter counter = stat.currentWindow().value();
if (error != null) {
// 如果出现异常,则 error计数器 +1
counter.getErrorCount().add(1);
}
// 不管是否出现异常,total计数器 +1
counter.getTotalCount().add(1);
// 判断异常比例是否超出阈值
handleStateChangeWhenThresholdExceeded(error);
}
Let's look at the method of threshold judgment:
private void handleStateChangeWhenThresholdExceeded(Throwable error) {
// 如果当前已经是OPEN状态,不做处理
if (currentState.get() == State.OPEN) {
return;
}
// 如果已经是 HALF_OPEN 状态,判断是否需求切换状态
if (currentState.get() == State.HALF_OPEN) {
if (error == null) {
// 没有异常,则从 HALF_OPEN 到 CLOSED
fromHalfOpenToClose();
} else {
// 有一次,再次进入OPEN
fromHalfOpenToOpen(1.0d);
}
return;
}
// 说明当前是CLOSE状态,需要判断是否触发阈值
List<SimpleErrorCounter> counters = stat.values();
long errCount = 0;
long totalCount = 0;
// 累加计算 异常请求数量、总请求数量
for (SimpleErrorCounter counter : counters) {
errCount += counter.errorCount.sum();
totalCount += counter.totalCount.sum();
}
// 如果总请求数量未达到阈值,什么都不做
if (totalCount < minRequestAmount) {
return;
}
double curCount = errCount;
if (strategy == DEGRADE_GRADE_EXCEPTION_RATIO) {
// 计算请求的异常比例
curCount = errCount * 1.0d / totalCount;
}
// 如果比例超过阈值,切换到 OPEN
if (curCount > threshold) {
transformToOpen(curCount);
}
}
```java
(entry == null) {
return;
}
// 尝试获取 资源中的 异常
Throwable error = entry.getError();
// 获取计数器,同样采用了滑动窗口来计数
SimpleErrorCounter counter = stat.currentWindow().value();
if (error != null) {
// 如果出现异常,则 error计数器 +1
counter.getErrorCount().add(1);
}
// 不管是否出现异常,total计数器 +1
counter.getTotalCount().add(1);
// 判断异常比例是否超出阈值
handleStateChangeWhenThresholdExceeded(error);
}
Let's look at the method of threshold judgment:
private void handleStateChangeWhenThresholdExceeded(Throwable error) {
// 如果当前已经是OPEN状态,不做处理
if (currentState.get() == State.OPEN) {
return;
}
// 如果已经是 HALF_OPEN 状态,判断是否需求切换状态
if (currentState.get() == State.HALF_OPEN) {
if (error == null) {
// 没有异常,则从 HALF_OPEN 到 CLOSED
fromHalfOpenToClose();
} else {
// 有一次,再次进入OPEN
fromHalfOpenToOpen(1.0d);
}
return;
}
// 说明当前是CLOSE状态,需要判断是否触发阈值
List<SimpleErrorCounter> counters = stat.values();
long errCount = 0;
long totalCount = 0;
// 累加计算 异常请求数量、总请求数量
for (SimpleErrorCounter counter : counters) {
errCount += counter.errorCount.sum();
totalCount += counter.totalCount.sum();
}
// 如果总请求数量未达到阈值,什么都不做
if (totalCount < minRequestAmount) {
return;
}
double curCount = errCount;
if (strategy == DEGRADE_GRADE_EXCEPTION_RATIO) {
// 计算请求的异常比例
curCount = errCount * 1.0d / totalCount;
}
// 如果比例超过阈值,切换到 OPEN
if (curCount > threshold) {
transformToOpen(curCount);
}
}