Spring Cloud Microservice Learning Series 11 Service Downgrade Introduction and Use

Service degradation

1 Introduction

What is service degradation? When the server pressure increases sharply, according to actual business conditions and traffic, some services and pages are strategically not processed or processed in a simple manner, thereby freeing server resources to ensure normal or efficient operation of core transactions.
If you still don’t understand, then you can give an example: If there are many people who want to pay me, but my server is running besides the payment service, there are other services running, such as search and timed tasks. And details etc. However, these unimportant services take up a lot of memory and CPU resources of the JVM. In order to collect all the money (money is the goal), I designed a dynamic switch to put these unimportant services directly on the outermost layer. Reject, so that the processed back-end processing service that collects money will have more resources to collect money (collecting money faster). This is a simple service degradation usage scenario.
**

2. Usage scenarios

**
What scenario is the service downgrade mainly used for? When the entire load of the whole micro-services architecture beyond the preset upper limit threshold or upcoming traffic expected when will exceed a preset threshold, in order to ensure that important or essential services can function properly, we can be of some unimportant or not urgent services or tasks and services delayed use or suspended .

3. Core design

3.1 Distributed switch

**
According to the above requirements, we can set up a distributed switch to achieve service degradation, and then centrally manage the switch configuration information. The specific plan is as follows:

Service Degradation-Distributed Switch

3.2 Automatic downgrade

Timeout degradation -mainly configure the timeout period and the number and mechanism of timeout retries, and use the asynchronous mechanism to detect the recovery situation
Downgrade of the number of failures -mainly some unstable APIs. When the number of failed calls reaches a certain threshold, they will be automatically downgraded. The asynchronous mechanism should also be used to detect the response.
Failure degradation -if the remote service to be called is down (network failure, DNS failure, HTTP service returns wrong status code and RPC service throws an exception), you can directly downgrade
Current limit downgrade -when the current limit is exceeded, temporary shielding can be used for short-term shielding

When we go to seckill or snap up some restricted purchases, the system may crash due to too much traffic at this time. At this time, the developer will use the current limit to limit the traffic. When the current limit threshold is reached, subsequent requests will be Downgrade; the solution after the downgrade can be: queuing page (divert the user to the queuing page and try again for a while), out of stock (inform the user directly that it is out of stock), error page (if the activity is too hot, retry later) test).
**

3.3 Configuration Center

**
The configuration information of microservice degradation is centralized management, and then friendly operations are carried out through the visual interface. Network communication is required between the configuration center and the application. Therefore, due to factors such as network interruption or network restart, the configuration push information may be lost, restart or network recovery can no longer be accepted, changes are not timely, etc., so the service degraded configuration The center needs to implement the following features to ensure that the configuration changes are achieved as much as possible:

Service degradation-configuration center

Start the active pull configuration -used to initialize the configuration (reduce the first timed pull cycle)
Publish and subscribe configuration -used to realize timely configuration changes (can solve about 90% of configuration changes)
Timing pull configuration -used to solve the situation of publication and subscription failure or disappearance (can solve about 9% of publication and subscription failure message changes)
Offline file cache configuration -used to temporarily solve the problem of not being able to connect to the configuration center after restart
Editable configuration document -used to directly edit the document to realize the definition of the configuration
Provide Telnet command to change the configuration -used to solve the common problem that the configuration center fails and the configuration cannot be changed

3.4 Processing strategy

When the service downgrade is triggered and new transactions arrive again, how do we handle these requests? From a global perspective of microservice architecture, we usually have the following commonly used downgrade processing solutions:

Page downgrade -click buttons and adjust static pages are disabled in the visual interface
Delayed service -such as delayed processing of timing tasks, delayed processing of messages after entering MQ
Write downgrade -directly prohibit service requests related to write operations
Read downgrade -directly prohibit the related service request
Cache downgrade -use cache to downgrade service interfaces that are frequently read

For the downgrade processing strategy at the back-end code level, we usually use the following processing measures for downgrade processing:

Throw exception
Return NULL
Call mock data
Call Fallback processing logic

4. Advanced features

We have made a downgrade switch for each service, and it has been verified online, and it feels completely fine.

Scenario 1 : One day, the operation performed an event, and suddenly ran over and said that now the traffic has almost reached the upper limit, is there a way to downgrade all unimportant services in batches? The developer looked at it dumbfounded, this is not the operation of DB, where is there a batch operation?
** Scenario 2**: One day, the operation got into trouble again, saying that we are going to have an event later, let us quickly downgrade all the unimportant services in advance, and the development is stunned again, how do I know that Which services are downgraded?
Reflection : Although the function of service degradation is realized, the experience during implementation is not considered. There are too many services, I don’t know which services to downgrade, and the downgrade speed of a single operation is too slow...
**

4.1 Grading and downgrading

**
When the microservice architecture has different degrees of situation, we can choose to abandon it according to the comparison of services (that is, the principle of losing the car to protect the handsome), so as to further ensure the normal operation of the core services.
If you wait for the online service to fail, you can choose which services should be downgraded and which cannot be downgraded one by one. However, there are hundreds of services online, and you will definitely be dragged down if you don’t have time to downgrade. At the same time, it will be a lot of work to sort out before big promotions or spikes. Therefore, it is recommended that architects or core developers are required to sort out the initial evaluation value in advance during the development period. Whether it can be downgraded to the default value.
In order to facilitate batch operations downgrade micro-service architecture services, we can establish an evaluation model services of importance from a global point of view, if able to do so, it is recommended to use the analytic hierarchy process (The analytic hierarchy process, referred to as AHP) mathematical construction Model model (or other models) for qualitative and quantitative evaluation (definitely many times better than the architect directly slap his head to decide whether to downgrade, of course, the difficulty and complexity will be much higher, that is, you need a mathematical modeling talent), The basic idea of analytic hierarchy process is that people's thinking and judgment process for a complex decision-making problem are roughly the same.
The following is the final evaluation model given by the individual, which can be designed as a reference model for the evaluation of service degradation:
We use mathematical modeling or the architect’s direct headshot approach, combined with the priority principle of service degradation, and based on the typhoon warning (All belong to the storm warning) level for reference design, the failure storm level of all services of the microservice architecture can be divided into the following four:
Evaluation model :

Blue Storm -Indicates the need to downgrade non-core services on a small scale
Yellow storm -indicates the need to downgrade non-core services on a moderate scale
Orange Storm -Indicates the need to downgrade non-core services on a large scale
Red storm -means that all non-core services must be downgraded

Design description :

The severity of the fault is: blue＜yellow＜orange＜red
It is suggested that services can be divided into: 80% non-core services + 20% core services according to the 28th principle

The above model is only the service degradation evaluation model of the overall microservice architecture. For specific big promotion or spike activities, it is recommended to focus on specific themes. (Activities of different topics depend on different services, and it is more important to use different ones for downgrading. reasonable). Of course, the same model can be used, but the data needs to be different. It is best to establish a set of model libraries, and then only need to input related services to output the final downgrading plan during implementation, that is, output the list of services that need to be downgraded when a blue storm occurs, and when a yellow storm occurs. List of services that need to be downgraded...
**

4.2 Downgrade weight

**
There is the concept of service weight in the microservice architecture, which is mainly used for weight selection during load. Similarly, the service degradation weight is similar, and it is mainly used for fine-grained priority selection when service degradation is selected . All services directly use more than a simple four division manner centrally, apparently too coarse, or when multiple services for the same level of need to downgrade downgrade order , how? Even I want artificial intelligence of automatic demotion , how more fine-grained control?
Based on the above-mentioned AI requirements, we can assign a downgrade weight to each service, so as to facilitate more intelligent service governance. And the value of its evaluation, the same may use a mathematical model is a qualitative and quantitative assessment out, the architect may be determined empirically direct Clap.

5. Summary and Outlook

The above provides a semi-realistic and semi-theoretical service degradation plan. Users can make appropriate choices according to the actual situation of their company. The author has not found that the complete plan has been implemented yet, but can suggest a long-term service governance plan. The research and implementation of the complete program by the large factories in China will have better governance value in the future era of artificial intelligence and the Internet of Everything (personal opinion). However, small factories are not recommended to use such a complicated solution for cost and value considerations, but it can realize the functional characteristics of distributed switching and simple hierarchical degradation.
In this paper, to a more desirable service degradation micro-management services architecture as the core, which recommends the use of an appropriate model mathematics to achieve qualitative and quantitative rational analysis and micro-management services for the future of artificial intelligence micro-management services (Artificial Intelligence Governance Micro Service , Referred to as AIGMS) to provide program support.

Reprinted from: https://blog.csdn.net/ityouknow/article/details/81230412