Practical methods for microservices to go offline gracefully

Introduction

This article introduces the practical method and principle of graceful offline and offline of microservices, including the elegant offline and offline logic and service warm-up for Spring applications, and the demo of using Docker to achieve lossless offline. At the same time, this article also summarizes the value and challenges of graceful log-off and offline.

About the Author

Yan Songbai

Tencent Cloud Microservice Architect

With more than 10 years of experience in the IT industry, he is proficient in software architecture design, microservice architecture, cloud architecture design and other fields, and has rich experience in microservice architecture in multiple industries such as pan-interaction, finance, education, and travel.

foreword

picture

The principle of graceful offline and offline of microservices is to ensure the stability and availability of services during the release process of microservices, and avoid traffic interruption or errors caused by service changes.

The principle of graceful online and offline microservices can be considered from three perspectives:

  • The graceful launch of the server means that after the service is started, it waits for the service to be fully ready before providing the service to the outside world, or there is a process of service warm-up.

  • The non-destructive offline of the server, that is, before the service stops, log out from the registration center, reject new requests, and wait for the old requests to be processed before going offline.

  • The client's disaster recovery strategy is to select healthy service instances through load balancing, retry, blacklist and other mechanisms when invoking services, and avoid invoking unavailable service instances.

Graceful offline and offline microservices can improve the stability and reliability of microservices and reduce risks and losses during the release process.

Elegant online

picture

Elegant launch, also known as lossless launch, or delayed release, or delayed exposure, or service warm-up.

The purpose of graceful launch is to improve the stability and reliability of the release, and avoid traffic interruption or errors caused by application changes.

The elegant way to go online

There are several ways to go online gracefully:

  • Delayed release : Delayed exposure of application services. For example, the application needs some initialization operations before it can provide services to the outside world, such as initializing caches, database connection pools and other related resources in place. Delayed exposure can be achieved through configuration or code.

  • QoS commands : Control the online and offline of application services through the command line or HTTP request, for example, do not register the service with the registration center when the application starts, but manually register the service after the service health check is completed.

  • Service registration and discovery : manage the status and routing information of application services through the registration center, such as registering services with the registration center when the application starts, and monitoring service state change events, canceling the service from the registration center when the application stops, and notifying other services to update routing information.

  • Gray-scale release : Control the traffic distribution of application services through distribution strategies. For example, when releasing a new version of an application, first import part of the traffic to the new version of the application and observe its operation. If there is no problem, gradually increase the proportion of traffic until it is all switched to the new version of the application.

The core idea of ​​the above methods is one, that is, wait for the service to be ready before releasing the request.

The realization of elegant launch

Most of the elegant launch is achieved through the registration center and service governance capabilities.

For applications with a long initialization process, since the registration is usually carried out synchronously with the application initialization process, the application may be registered to the registration center for external consumers to call before it is fully initialized. At this time, direct calls may cause request errors.

Therefore, the basic idea for elegant online launch through service registration and discovery is:

  • When the application starts, a health check interface is provided to feedback the status and availability of the service.

  • After the application starts, the following methods can be used to temporarily prevent new requests from entering the new version of the service instance.

    • Temporarily do not register the service with the registry.
    • Isolate services. Some registries support isolating service instances, such as Polaris.
    • Set the weight to 0.
    • Change Enable to False for the service instance.
    • Let the health check interface return unhealthy status.
  • After the new version of the application instance completes the initialization operation and ensures availability, the above method is correspondingly canceled, so that new requests can be routed to the new version of the application instance.

  • If warm-up is required, let the traffic flow into the application instance of the new version increase proportionally.

In this way, an elegant online process can be achieved to ensure that when a request comes in, the request will not fail because the application instance of the new version is not ready.

Polaris Code Demo Elegantly Launched

Let's take Spring Cloud and Polaris as examples to talk about how to go online elegantly through service registration and discovery.

First, we need to create a Spring Cloud project and add Polaris dependencies.

Then, we need to configure Polaris related information in the application.properties file, such as registry address, service name, group name, etc., for example:

spring:

application:

name: ${application.name}

cloud:

polaris:

address: grpc://${修改为第一步部署的 Polaris 服务地址}:8091

namespace: default

Then, we need to create a Controller class that provides a simple interface for returning service information, for example:

@RestController

public class ProviderController {

@Value("${server.port}")

private int port;

@GetMapping("/hello")

public String hello() {

return "Hello, I am provider, port: " + port;

}

}

Finally, if necessary, we can rewrite the health check interface to feedback the status and availability of the service. Here we need to introduce Actuator.

@Component

public class DatabaseHealthIndicator implements HealthIndicator {

@Override

public Health health() {

if (isDatabaseConnectionOK()) {

return Health.up().build();

} else {

return Health.down().withDetail("Error Code", "DB-001").build();

}

}

private boolean isDatabaseConnectionOK() {

// 检查数据库连接、缓存等

return true;

}

}

In this way, we have completed a simple service provider application, and can realize service registration and discovery through Polaris.

Next, we need to create a service consumer application and also add Polaris dependencies and configuration information.

Then, use the RestTemplate to call the interface of the service provider, for example:

@SpringBootApplication

public class ConsumerApplication {

public static void main(String[] args) {

SpringApplication.run(ConsumerApplication.class, args);

}

@Bean

@LoadBalanced // 开启负载均衡

public RestTemplate restTemplate() {

return new RestTemplate();

}

@RestController

public class ConsumerController {

@Autowired

private RestTemplate restTemplate;

@GetMapping("/hello")

public String hello() {

// 使用服务名来调用服务提供者的接口

return restTemplate.getForObject("<http://provider/hello>", String.class);

}

}

}

Here we use the @LoadBalanced annotation to enable the load balancing function, and use the service name provider to call the interface of the service provider.

In this way, we have completed a simple service consumer application, and can realize service registration and discovery through Polaris.

Next, we can implement the elegant online process through the following steps:

  • When releasing a new version of a service provider application, start the new version of the application instance first, but do not register the service with the registry, or let the health check interface return an unhealthy state, so that no new requests will enter the new version of the application instance. This can be achieved through configuration or code, for example:
# 不向注册中心注册服务

spring.cloud.polaris.discovery.register=false
// 让健康检查接口返回不健康的状态

this.isHealthy = false;
  • After the new version of the application instance completes the initialization operation, register the service with the registry, or let the health check interface return a healthy state, so that new requests can be routed to the new version of the application instance. This can be achieved through configuration or code, for example:
# 向注册中心注册服务

spring.cloud.polaris.discovery.register=true
// 让健康检查接口返回健康的状态

this.isHealthy = true;

In this way, an elegant online process can be achieved, ensuring that requests being processed will not be interrupted, and new requests will be routed to the new version of the application.

However, if the ultimate requirements for elegant online launch are not high, Polaris itself supports elegant online launch without additional operations. Because the logic of Polaris is that when all the beans in Spring are loaded, the Controller will only register the service after it can be accessed. Therefore, in most scenarios, it has met the requirements for graceful launch.

Warm up service

Service preheating refers to putting the service in a running state before the service goes online, allowing it to load the necessary resources, establish a connection, etc., so that the service can quickly respond to requests after the service goes online. As shown below.

In the case of heavy traffic, the newly started service directly processes a large number of requests, which may cause problems such as request blocking and error reporting due to incomplete initialization of internal resources in the application. At this time, by warming up the service and helping the service complete initialization before processing a large number of requests through small traffic at the start-up stage of the service, it can help to find possible problems after the service goes online, such as insufficient resources, too many connections, etc., so as to adjust and optimize in time to ensure the stability and reliability of the service.

picture

Cloud Native API Gateway Realizes Service Preheating

Cloud Native API Gateway is a high-performance and high-availability cloud gateway hosting product launched by Tencent Cloud based on the open source microservice gateway. We can achieve service warm-up with a few simple configurations.

First of all, when we create a new backend service on the gateway, we can turn on the slow start switch in the figure below. At the same time, you can set the time for slow start.

picture

After it is turned on, when a new service node comes online on the server side, it will gradually increase the weight of the new node from 1 to the target value within the set slow start time. The traffic to this new node will slowly increase.

If there are multiple new nodes, all new nodes will start slowly.

For services whose back-end sources are K8S services, registration centers, and IP lists, slow start can be achieved, that is, service warm-up.

graceful offline

picture

Lossless logout and graceful logout mean the same thing. It is all to avoid the situation that the request fails due to the request not being processed when the service is offline.

How to log off gracefully

Some commonly used tools or frameworks for lossless offline are:

  • Dubbo-go : supports multiple registration centers, load balancing, disaster recovery strategies, etc., and can realize the design and practice of elegant online and offline.

  • Spring Cloud : Provides a variety of components to implement service configuration, routing, monitoring, fusing, etc., and can implement the logic of graceful offline by listening to the ContextClosedEvent event.

  • Docker : The container can be stopped through the Docker Stop or Docker Kill command. The former will send a SIGTERM signal to the PID1 process of the container, and the latter will send a SIGKILL signal. If the program can respond to the SIGTERM signal, it can achieve graceful offline operation.

The principle of spring cloud graceful offline

ContextClosedEvent is an event released by the Spring container when it is closed. You can listen to this event by implementing the ApplicationListener interface, and execute some custom logic in the onApplicationEvent method.

For microservices in Spring Cloud, when a ContextClosedEvent event is received, the following things can be done:

  • Unregister the current service from the registry so that no new requests come in.

  • Reject or delay new requests so that ongoing requests are not interrupted.

  • Wait some time for old requests to finish processing, or timeout.

  • Close the service and release resources.

In this way, the logic of graceful offline can be realized, and traffic interruption or error caused by service changes can be avoided.

Demo of Spring Boot graceful offline

In the old version, we need to implement the TomcatConnectorCustomizer and ApplicationListener<ContextClosedEvent> interfaces, then we can get the Tomcat Connector object in the Customize method, and listen to the closing event of the Spring container in the onApplicationEvent method.

In 2.3 and later versions, we only need to add a few configurations in application.yml to enable graceful shutdown.

# 开启优雅停止 Web 容器,默认为 IMMEDIATE:立即停止

server:

shutdown: graceful

# 最大等待时间

spring:

lifecycle:

timeout-per-shutdown-phase: 30s

The specific implementation logic of this switch is in our GracefulShutdown.

Then we need to add the Actuator dependency, and then expose the Actuator's Shutdown interface in the configuration.

# 暴露 shutdown 接口

management:

endpoint:

shutdown:

enabled: true

endpoints:

web:

exposure:

include: shutdown

At this time, we can execute graceful shutdown by calling http://localhost:8080/actuator/shutdown , and it will return the following content:

{

"message": "Shutting down, bye..."

}

Advantages and disadvantages

I think this approach has the following pros and cons:

Advantages :

  • Simple and easy to use, only two interfaces need to be implemented to realize the logic of graceful offline.

  • For Spring Boot applications with Tomcat as an embedded container, no additional configuration or dependencies are required.

  • It can ensure that the requests being processed will not be interrupted, and new requests will not enter, avoiding traffic interruption or errors caused by service changes.

shortcoming:

  • It is only applicable to Spring Boot applications with Tomcat as the embedded container. If other containers or deployment methods are used, additional implementation may be required.

  • It needs to wait for a certain period of time for the processing request to complete or time out, which may affect the speed of service stop and resource release.

  • If there are too many or too slow requests being processed, the thread pool may not be closed gracefully, or the system's termination time may exceed, resulting in a forced shutdown.

Demo of Docker gracefully offline

A simple JS application is used here to demonstrate the process of Docker's lossless offline.

First, we need to create a Dockerfile to define a simple application container, the code is as follows:

# 基于 node:14-alpine 镜像

FROM node:14-alpine

# 设置工作目录

WORKDIR /app

# 复制 package.json 和 package-lock.json 文件

COPY package*.json ./

# 安装依赖

RUN npm install

# 复制源代码

COPY . .

# 暴露 3000 端口

EXPOSE 3000

# 启动应用

CMD [ "node", "app.js" ]

Then, we need to create an app.js file to define a simple web application, the code is as follows:

// 引入 express 模块

const express = require('express');

// 创建 express 应用

const app = express();

// 定义一个响应 /hello 路径的接口

app.get('/hello', (req, res) => {

// 返回 "Hello, I am app" 字符串

res.send('Hello, I am app');

});

// 监听 3000 端口

app.listen(3000, () => {

// 打印日志信息

console.log('App listening on port 3000');

});

Next, we need to execute the following commands in the terminal to build and run our application container and view the page results.

# 构建镜像,命名为 app:1.0.0

docker build -t app:1.0.0 .

# 运行容器,命名为 app-1,映射端口为 3001:3000

docker run -d --name app-1 -p 3001:3000 app:1.0.0

# 查看容器运行状态和端口映射信息

docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

a8a9f9f7c6c4 app:1.0.0 "docker-entrypoint.s…" 10 seconds ago Up 9 seconds 0.0.0.0:3001->3000/tcp app-1

# 在浏览器中访问 <http://localhost:3001/hello> ,可以看到返回 "Hello, I am app" 字符串

At this time, suppose we want to release a new version of the application, we need to modify the code in the app.js file, and change the returned string to "Hello, I am app v2".

Then, we need to execute the following commands in the terminal to build and run the new version of the application container:

# 构建镜像,命名为 app:2.0.0

docker build -t app:2.0.0 .

# 运行容器,命名为 app-2,映射端口为 3002:3000

docker run -d --name app-2 -p 3002:3000 app:2.0.0

# 查看容器运行状态和端口映射信息

docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

b7b8f8f7c6c4 app:2.0.0 "docker-entrypoint.s…" 10 seconds ago Up 9 seconds 0.0.0.0:3002->3000/tcp app-2

a8a9f9f7c6c4 app:1.0.0 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:3001->3000/tcp app-1

# 在浏览器中访问 <http://localhost:3002/hello> ,可以看到返回 "Hello, I am app v2" 字符串

Next, you need to gracefully take down the old version of the application container, let it finish processing requests, stop accepting new requests, and finally exit the process.

# 向旧版本的应用容器发送 SIGTERM 信号,让它优雅地终止

docker stop app-1

# 查看容器运行状态和端口映射信息

docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

b7b8f8f7c6c4 app:2.0.0 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:3002->3000/tcp app-2

# 在浏览器中访问 <http://localhost:3001/hello> ,可以看到无法连接到服务器的错误

In this way, we have realized the process of gracefully going offline through Docker, ensuring that the requests being processed will not be interrupted, and new requests will be routed to the new version of the application.

The Docker Stop command is mainly used here. The Docker Stop command will send a SIGTERM signal to the container, which is a way to gracefully terminate the process. It will give the target process a chance to clean up the aftermath, such as completing the request being processed, releasing resources, and so on. If the target process does not exit within a certain period of time (10 seconds by default), the Docker Stop command will send a SIGKILL signal to forcefully terminate the process.

Therefore, the premise of using the Docker Stop command to achieve graceful offline is that the application in the container can correctly respond to the SIGTERM signal and perform cleanup after receiving the signal. If the application in the container ignores the SIGTERM signal, or if an exception occurs during the cleanup process, the Docker Stop command cannot achieve the effect of graceful offline.

The way to get an application in a container to respond correctly to the SIGTERM signal depends largely on what process number 1 is in the container and how it handles the signal. If the No. 1 process in the container is the application itself, then the application only needs to register a handler function for the SIGTERM signal in the code to perform cleanup and exit the process. For example, in Node.js, you can write:

// 定义一个处理 SIGTERM 信号的函数

function termHandler() {

// 执行清理工作

console.log('Cleaning up...');

// 退出进程

process.exit(0);

}

// 为 SIGTERM 信号注册处理函数

process.on('SIGTERM', termHandler);

Polaris' graceful roll-off

Polaris' heartbeat is maintained every 5 seconds by default, and the client's cache is refreshed every 2 seconds by default. Theoretically, in extreme cases, there will be 2 seconds of unavailable time when the service goes offline. But the client has a retry mechanism, and the timeout period of most clients is greater than 2 seconds. Therefore, in most cases, the offline service of Polaris will not cause business perception.

Polaris gracefully rolls off the line in a number of ways. Among them, the Spring Boot and Docker methods above are two of them.

The other is to perform service isolation and anti-registration during PreStop when the service is offline.

Such isolation operations can be done manually or automatically through scripts.

picture

As shown in the figure above, the isolated instance will not be discovered by the caller, so that no new demand will come in. After the existing request is processed, the offline operation can be performed.

Summarize

picture

The value of elegant online and offline

In the practice of microservices, achieving graceful online and offline can bring us the following benefits:

  1. Minimize service interruption: By gracefully going offline and offline, the time and scope of service interruption can be minimized, thereby ensuring service availability and stability.

  2. Avoid data loss: Graceful offline can ensure that the request being processed can be completed, avoiding data loss and request failure.

  3. Improved user experience: Graceful logouts can ensure that users do not encounter any interruptions or errors while using services, thereby improving user experience and satisfaction.

  4. Simplify the deployment process: By using automated tools and processes, you can simplify the deployment process, reduce manual intervention and errors, and improve deployment efficiency and quality.

  5. Improve maintainability: By using monitoring and logging tools, problems can be discovered and resolved in a timely manner, improving service maintainability and reliability.

These benefits can help enterprises improve service quality and efficiency, and enhance user satisfaction and competitiveness.

The challenge of going offline gracefully

But at the same time, graceful offline and offline also face some challenges:

  1. Increased complexity: The microservice architecture is usually composed of multiple services, and each service has its own life cycle and dependencies. Therefore, the interaction and coordination between multiple services need to be considered when going offline gracefully, which increases the complexity of the system.

  2. Complex deployment process: Elegant offline and offline requires the use of automated tools and processes, which require a lot of time and resources to build and maintain, increasing the complexity of the deployment process.

  3. Data consistency problem: Graceful offline needs to ensure that the requests being processed can be completed, but this may cause data consistency problems, and measures need to be taken to solve this problem.

  4. High requirements for personnel skills: The microservice architecture requires higher technical level and skills, as well as more experience in development and operation and maintenance, which places high requirements on the personnel of the enterprise.

To sum up, enterprises need to seriously consider these challenges and take corresponding measures to solve these problems, so as to ensure a better implementation of graceful offline and offline in the practice of microservices.

The 8 most in-demand programming languages ​​in 2023: PHP strong, C/C++ demand slow Programmer's Notes CherryTree 1.0.0.0 released CentOS project declared "open to everyone" MySQL 8.1 and MySQL 8.0.34 officially released GPT-4 getting more and more stupid? The accuracy rate dropped from 97.6% to 2.4%. Microsoft: Intensify efforts to use Rust Meta in Windows 11 Zoom in: release the open source large language model Llama 2, which is free for commercial use. The father of C# and TypeScript announced the latest open source project: TypeChat does not want to move bricks, but also wants to fulfill the requirements? Maybe this 5k star GitHub open source project can help - MetaGPT Wireshark's 25th anniversary, the most powerful open source network packet analyzer
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4587289/blog/10088065