three


Introduction

Tair is a distributed key/value storage engine developed by Taobao. Tair is divided into persistent and non-persistent usage. Non-persistent tair can be regarded as a distributed cache. Persistent tair stores data Stored in the disk. In order to solve the loss of data caused by disk damage, tair can configure the number of data backups, and tair automatically puts a different backup of data on different hosts. Its backup will continue to provide services.


The overall structure of

tair As a distributed system, tair is composed of a central control node and a series of service nodes. We call the central control node config server. The service node is data server. config The server is responsible for managing all data servers and maintaining the status information of the data server. The data server provides various data services to the outside world, and reports its own status to the config server in the form of heartbeat. The config server is a control point and a single point. One master and one standby form to ensure its reliability. All data servers have the same status.

What is the load balancing algorithm of

tair? The distribution of tair adopts a consistent hash algorithm. For all keys, they are assigned to Q Among the buckets, the bucket is the basic unit of load balancing and data migration. The config server assigns each bucket to a different data server according to a certain strategy. Because the data is hashed according to the key, it can be considered that the data in each bucket It is basically balanced. The balance of bucket distribution is guaranteed, and the balance of data distribution is guaranteed.

What happens when data servers are increased or decreased

When a data server fails and is unavailable, the config server will find out this situation. The config server is responsible for recalculating the distribution table of a new bucket on the data server, and reassigns the access of the bucket originally served by the faulty machine to In other data servers. At this time, data migration may occur. For example, the bucket originally in charge of data server A needs to be in charge of B in the new table. If there is no data in the bucket on B, then the data will be migrated Go to B. At the same time, the config server will find out which buckets have a reduced number of backups, and then increase the backups of these buckets on the data server with lower load according to the load situation. When the system increases the data server, the config server will coordinate the data according to the load. The server migrates some of the buckets they control to the new data server. After the migration is completed, the routing is adjusted. Of course, some data servers may be reduced in the system while other data servers may be added. The processing principle is the same as above. , the config server will push the new configuration information to the data server. When the client accesses the data server, it will send the version number of the routing table cached by the client. If the data server finds that the version number of the client is too old, it will notify the client. The client goes to the config server to get a new routing table. If the client accesses a data server and becomes unreachable (the data server may be down), the client will take the initiative to go to the config server to get a new routing table.


Migration occurs How does the data server provide services to the outside world?

When the migration occurs, let's take an example. Suppose data server A wants to migrate buckets 3, 4, and 5 to data server B. Because before the migration is completed, the client's routing table does not change, and the client's routing table for 3, 4, and 5 does not change. Access requests will be routed to A. Now suppose that 3 has not been migrated, 4 is being migrated, and 5 has been migrated. Then if it is an access to 3, it is nothing special, the same as before. If it is an access to 5, then A will Forward the request to B, and return the return result of B to the client. If it is an access to 4, it will be processed in A. At the same time, if it is a modification operation of 4, the modification log will be recorded. When the migration of bucket 4 is completed, The logs are also sent to B, and these logs are applied on B. Finally, for bucket 4 on AB, the real migration is complete if the data is completely consistent. The client will receive an intermediate temporary state allocation table. In this table, the buckets responsible for the downed data server are temporarily assigned to the backup data server for processing. At this time, the service is available, but the load may not be balanced . When the migration is completed, a new load balancing state can be achieved again.


The strategy when buckets are distributed on the data server

The program provides two strategies for generating allocation tables, one is called load balancing priority, and the other is called location security priority: let's look at the load priority strategy first. When the load priority strategy is adopted, the config server will try to distribute the buckets as evenly as possible To each data server. The so-called as much as possible refers to load balancing as much as possible without violating the following principles. 1. Each bucket must have COPY_COUNT pieces of data 2. Each piece of data in a bucket cannot be on the same host; the principle of location security priority That is to say, under the condition that the above two principles are not violated, the location security conditions must be met, and then load balancing is considered. The location information is obtained by _pos_mask (see the explanation of configuration items in the installation and deployment documentation). Generally, we control _pos_mask to make different computer rooms have different location information. Then when location security is prioritized, one condition must be added, and each piece of data in a bucket cannot all be located in the same location (not in the same location). One computer room). There is a problem here. If there are only two computer rooms, there are 100 data servers in computer room 1, and there is only one data server in computer room 2. At this time, the pressure on the data server in computer room 2 will inevitably be very large. So here A control parameter _build_diff_ratio is generated (see installation and deployment documentation). When the difference ratio of the computer room is greater than this configuration value, the config server will no longer build a new table. How is the difference ratio of the computer room calculated? First find the computer room with the most machines, Might as well set RA, the number of data servers is SA. Then the number of other data servers is recorded as SB. Then the difference ratio of the computer room =|SA – SB|/SA. Because generally the COPY_COUNT of our online system configuration is 3. In this case Next, let's say there are only two computer rooms RA and RB, then what kind of data server number is the balance of the two computer rooms? When the difference ratio is less than 0.5, the load of each data server can be completely balanced. Here One thing to note, assume that there are 6 machines in the RA room and 3 machines in the RB. Then the difference ratio = 6 – 3 / 6 = 0.5. If the capacity is expanded at this time,Adding a data server in computer room A, the difference ratio after expansion = 7 – 3 / 7 = 0.57. In other words, only adding data servers in a computer room with a large number of machines will increase the difference ratio. If our _build_diff_ratio configuration value is 0.5. After this expansion, the config server will refuse to continue to build new tables.


Consistency and reliability of tair Reliability and consistency in

distributed systems cannot be guaranteed at the same time, because we must allow network errors to occur. tair uses replication technology to improve reliability, and some optimizations are made to improve efficiency , In fact, when no errors occur, tair provides a strong consistency. However, when a data server fails, the client may not be able to read the latest data within a certain time window. Even the latest data loss occurs The server side of


the client tail provided by

tair is written in C++, because socket communication is used between the server and the client. In theory, any language that can implement socket operations can be directly implemented as a tair client. Clients include java and C++. The client only needs to know the location information of the config server to enjoy the services provided by the tair cluster.





Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326941067&siteId=291194637