[Open Source] Tseer: Lightweight Implementation of Tars Name Service Function

Author: Zhong Ke

1. Introduction to TSeer

TSeer is a set of service registration and discovery fault-tolerant solutions, which is a lightweight of the Tars name service function. It is widely used in many businesses such as Tencent Browser, App Store, Butler, Mobile Bookstore, Tencent Literature, and Guangdiantong. Currently, it carries tens of billions of requests per day.

TSeer is light and flexible, less intrusive to business, and non-tars services can also be seamlessly accessed. On top of the core function of service discovery, Tseer also supports a variety of load balancing algorithms and provides reliable fault tolerance strategies, which can effectively solve problems such as cross-region and cross-machine room calls, and greatly improve service availability and call quality. It is a micro-service Excellent name service solution in the framework.

TSeer has two methods of web management interface and API access for users to choose freely according to their needs. Through the proxy node and proxy server mechanism, it provides a transparent service discovery function for the business that needs to publish changes frequently. The learning cost is very low, and the operation is also very convenient. , which is very friendly to business maintenance personnel.

At present, the name service solution TSeer of the microservice framework has been officially open sourced. The Github address is: https://github.com/Tencent/Tseer

2. R&D background

In traditional monolithic applications, changes are released relatively rarely, and the network locations in the system rarely change, and occasional changes can also be handled by manually changing the configuration. However, in the current environment of massive services, this architecture has been unable to efficiently and stably support fast-growing businesses. Larger and larger distributed service clusters and microservice frameworks have gradually become mainstream.

However, while the new architecture provides better support for the business, frequent release updates and dynamic scaling also lead to frequent changes in network locations. In this case, the large-scale repetitive work of business maintenance personnel manually changing the configuration not only increases The risk of error, its inefficiency will also limit the rapid development of the business. Often the configuration has not been changed before the new changes need to be released. So it is necessary to have an automated service discovery tool to solve these problems.

However, these are not the whole problem. On the premise of ensuring access success, response time, as the most important indicator of service quality, is the most critical link affecting business development. Due to the complex calling relationship between multiple service sets and other factors such as cross-region and cross-network calls, the response time cannot meet expectations, which is a thorny problem that continues to plague the entire business development cycle. At the same time, no matter whether a physical machine or a virtual machine is used, the unavailability caused by the node hanging occurs from time to time, and how to effectively tolerate faults is also an urgent problem to be solved. Based on these problems, we developed TSeer.

3. TSeer Architecture

The entire Tseer structure is divided into four parts: TseerServer, business client (main tune), business server (tuned), and web management.

• TseerServer

TseerServer is the hub and core module of the entire Tseer. When a new node goes online, it needs to first register in the Tseer service cluster through the WEB management platform, and record its network location information in the Tseer system. When the node needs to be offline or otherwise modified, it is also necessary to perform related operations on the WEB management platform. The adjusted point will also regularly report the heartbeat to the TseerServer, and the server will block the node whose heartbeat has timed out so that it cannot be called.

• Business Client

The business client is the node that needs to call other services, called the main caller, and is the user of the service discovery function. Tseer provides business clients with two methods: installing Agent and calling API to obtain the address of the service (callee) that needs to be called from TseerServer to complete the call.

• Business server

The business server is the node that needs to be called, called the called, and is the provider of the service. When a new node goes online, it needs to be registered with TseerServer. No matter how many nodes there are in the same service cluster to be called, the service cluster needs to be registered with a unified name during registration. The main caller only needs to specify the name of the service to be called in the calling logic, and Tseer will return the called address according to the called name. When the capacity needs to be expanded, the new node only needs to be added under the name corresponding to the service. It is very convenient for business personnel to manage the information of a large number of service nodes under the cluster being adjusted.

• Web management

The addition, deletion and modification of business information and node routing information are all operated through the web management interface, which is simple, fast and intuitive. Even the agent installation package can be released through the web platform update. For detailed usage, please refer to the usage documentation of the TSeer project on github.

4. Features of Tseer function

1. Load balancing

When some nodes in the same business cluster are called frequently and other nodes do not bear a reasonable load, not only the service quality and response time of the business will be greatly reduced, but also resources will be wasted.

In the Tseer system, when the main caller initiates a call, it will provide four load balancing methods for the call for all available nodes under the callee name to ensure the reasonable load of each node, namely:

• polling

• Random

• Static weights

• Consistent hashing

Users can also customize the load balancing implementation by calling grouping, which will be mentioned below.

2. Fault tolerance

In order to solve the service unavailability and service quality reduction caused by node failure, Tseer also provides a reliable fault tolerance mechanism.

After the main caller makes a call, the call result will be reported. If the call fails, Tseer will temporarily shield the node to prevent the faulty node from being called repeatedly. Tseer will periodically detect the shielded node, and will reactivate it when it finds that the faulty node is restored to service.

For any adjusted point, if one of the following conditions is met, the node is shielded:

1. The number of call failures reaches 2 within a detection period (60 seconds), and the number of call errors accounts for more than 50% of the total number of calls

2. The call fails more than 5 times in a row within 5 seconds

For the blocked node Tseer Agent/Api will retry the blocked node every 30 seconds.

At the same time, when the Tseer fails, the main call can also continue to call according to the cached information.

3. Call optimization

Tseer provides IDC grouping, Set grouping, and All three methods for calling logic to solve problems such as cross-region calling.

• All

Provides all available adjusted point addresses to the master

• IDC grouping

The IDC grouping can be approximately regarded as the nearest access.

The method is divided into two levels. The first is the physical group, which is the smallest group scheduling unit, that is, a unified group name is assigned according to the computer room or area where the node is located. The second is a logical group composed of physical groups, which can be understood as a unified group name divided according to a larger area.

For the logical grouping of IDC, Tseer also defines a call priority policy. That is, when some logical groups are unavailable, a list of available adjusted point addresses will be returned according to the priority policy.

• Set grouping

IDC grouping is mainly based on the concept of regions to divide groups to implement nearby access policies. In the background service architecture, when the business scale reaches a certain number, if you want to implement isolation control for certain service nodes based on capacity, grayscale, and regional management , the IDC grouping cannot be satisfied, and the Set grouping is a further refinement of the IDC grouping.

The naming rules for Set groups are: Set name.Set area.Set group. The Set group is the name of the smallest distinguishing unit, and the wildcard * is supported, indicating all groups in the Set area. Such as 0,1,2,3,4 or a,b,c,d.

The calling logic of Set grouping is as follows:

1. Both the main caller (client) and the callee (server) have enabled the Set grouping, and the Set name must be consistent before it is considered to be enabled within the same SET.

2. The caller and callee who enable Set grouping can only access nodes in the same Set

3. The main caller enables Set grouping, and the callee does not enable Set grouping, then the logic of querying by IDC grouping will be performed by default (provided that IDC grouping is enabled)

4. Two access methods

According to whether the service client deploys Tseer Agent in its physical machine, Tseer can be used in two ways: Agent and Tseer API:

• Agent mode

name routing

In Agent mode, Tseer Agent will periodically cache the information of the called party. And according to the load balancing policy specified by the caller, the adjusted point information is returned to the caller. If the caller wants to achieve load balancing through service features, Tseer also supports returning the called group information to the caller according to the grouping strategy specified by the caller.

data reporting

After each call is completed, the caller needs to call the reporting interface provided by the Tseer Api to report the call information, and the call information will be reported to the Tseer Agent by the Tseer Api. The Tseer Agent will remove the invalid adjusted points according to the calling information.

When the fault-tolerant agent mode is used, if the Tseer Agent fails, the Tseer Api will return the visited nodes from the memory to the caller. If the Tseer Api cache fails, the Tseer Api will restore the cache information from the cache file in the local disk. Provided to the keynote. It should be noted that at this time, the information provided by Tseer Api to the main call service is lossy information, and Tseer Api does not guarantee that the node is healthy.

• Tseer Api method

name routing

The difference between the Agent mode and the Tseer Api mode is whether the Tseer Agent needs to be deployed in the main host. Tseer Api will directly access the Tseer server. And the information cache of the called party, load balancing and culling of invalid nodes are all done in Tseer Api.

Tseer Api will periodically pull the back-end information of Tseerserver and block the unavailable adjustment points.

fault tolerance

When the Tseerserver fails, the Tseer Api will return the information cached in memory to the caller. When the memcache is not available, the Tseer Api will restore the memcache with the cache in the local disk.

Comparison between Agent Api and Tseer Api

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325436198&siteId=291194637