Analysis of ZNBase Clock Synchronization Technology: Atomic Clock Realizes True-time Mechanism

Guided reading

In a distributed database system, in order to solve the problem of the sequence of events of different clusters and nodes, clock synchronization is very important. This article will introduce several mainstream clock synchronization solutions in the industry, as well as the True-time mechanism implemented by the distributed database ZNBase based on atomic clock technology.

Clocking Solutions in the Industry

In the current mainstream distributed database systems in the industry, different clock synchronization schemes are adopted.

The more popular TiDB and OceanBase in China use the Timestamp Oracle (TSO) solution, which is a centralized timing solution. TSO uses a single time source and a single point timing method to realize the global clock, and uses a globally unique timestamp as the global transaction id. Among them, the transaction model of TiDB is based on Google Percolator, so it uses a TSO mechanism similar to Percolator from the beginning. TiDB uniformly allocates time through the cluster management module PD to ensure the global synchronization of the entire system time. The advantage of this mode is that it is simple to implement, and the network overhead is very small in the same data center, but the latency is high in cross-regional usage scenarios. In addition, the centralized timing solution will become the performance bottleneck of the entire system, and the single point of failure of the timing service will directly lead to the unavailability of the entire distributed cluster.

In addition, CockroachDB uses HLC as a clock synchronization scheme; Google's Spanner uses a ture-time mechanism combined with atomic clock + GPS hardware to achieve global low-latency deployment.

Understanding Clock Synchronization

The core of the database is to sort the operations of each transaction. In the traditional stand-alone architecture, the sorting of transactions can be easily achieved by log sequence numbers or transaction IDs. However, under the distributed architecture, the database runs on multiple servers, each database instance has an independent clock or log (LSN), and the clock point between the server and the server is different, so the clock under the distributed database cannot be Reflects the global order of things. As one of the key technologies of distributed databases, clock synchronization technology exists to solve the problem of the sequence of events in distributed databases.

With the continuous expansion of the scale of distributed systems, the time synchronization problem of different clusters and different nodes becomes extremely complex. At present, the clock synchronization schemes commonly used in the distributed database industry include five categories: physical clocks, logical clocks, vector clocks, hybrid logical clocks, and True-Time mechanisms. 

1. Physical Clock (PT)

The physical clock is the local clock of the machine. Due to the different hardware of the device, there is a deviation. The error in a day may be milliseconds or even seconds. Therefore, it is necessary to synchronize different machine clocks to make the time between machines relatively unified. Usually, centralized global time synchronization is used to ensure the time uniformity of each node.

NTP is the most commonly used way of synchronizing time. The NTP protocol (Network Time Protocol) is a network protocol used to synchronize computer time. It can enable the computer to synchronize its server or clock source (such as quartz clock, GPS, etc.), and can provide higher accuracy. time correction. The mechanism is the C/S architecture, that is, there is an NTP client on each machine, which synchronizes with the NTP server and calibrates the local time. 

However, when using the NTP protocol for time synchronization in a distributed database, there will still be an error of 100-500ms, which will cause a large clock error between distributed nodes, thus affecting the high concurrency performance of the database. 

2. Logic Clock (LC)

A logical clock, a concept proposed by Turing Award winner Leslie Lamport in 1978, is a time mechanism used to distinguish the order of events in distributed systems. The logical clock does not have any central node to generate time. Each node has its own local logical time. The logical order of transactions is determined mainly through the happened-before relationship, thereby determining the partial order relationship of transactions.

For example, in two different instances, node A generates transaction 1 first, and transaction 1 has no connection with any other node. At this time, the logical clock of node A is +1, and other nodes remain unchanged; node A generates transaction 2 again, and transaction 2 and Node B is connected. At this time, the logical clock of node A is +1, and the logical clock of node B is directly +2, thus synchronizing with the clock of node A. By establishing a connection between the two clocks, the logical clocks between the two nodes are synchronized, and the happened-before sequence between them is constructed at this time, and the transaction representing the A node starts at the new transaction of the B node. done before.

In a distributed database, two transactions are unrelated transactions if they do not operate on the same node. Unrelated transactions can be considered non-sequential. However, when a transaction spans multiple nodes, the relationship between multiple nodes becomes a relational transaction, and the causal sequence between transactions is constructed. 

The main problem of the logical clock is that it can only guarantee the orderly execution of events in the same process. When different processes are involved, the appropriate total order relationship cannot be determined, which is likely to cause conflicts. 

3. Vector Clock (VC)

Vector clock is another logical clock method evolved on the basis of Lamport algorithm. It records not only the Lamport timestamp of this node, but also the Lamport timestamp of other nodes through the vector structure, so it can well describe the simultaneous relationship. and causality of events.

The vector clock stores a vector at each node, and each element of the vector is the logical clock of each node. In essence, the conflict of logical clocks is solved by storing the clock backups of all nodes. However, since the dimension of the clock vector is equal to the number of nodes, the space complexity is high, which affects the efficiency of database storage and transmission. 

4. Hybrid Logic Clock (HLC)

Hybrid logical clocks are a solution that combines physical and logical clocks. 

The mixed logical clock divides the distributed clock into two parts, the upper half is filled with physical clocks, and the lower half is filled with logical clocks. That is, on the basis of the physical clock, a logical clock is introduced for calibration within a certain deviation range, so that the two can reach an agreement as much as possible. Its core content includes the following four points:

(a) The causal consistency happened-before satisfying LC.

(b) The storage space of a single clock is O(1) (VC is O(n), where n is the number of nodes in the distributed system).

(c) The size of a single clock has definite bounds (not infinity).

(d) As close as possible to the physical clock PT, that is, the difference between HLC and PT has a definite boundary.

The mixed logic clock has a total of 64 bits, the first 32 bits represent the physical clock, and the last 32 bits are used for the logic count. 

The problem of synchronization accuracy of the physical clock mentioned above is also one of the main problems that HLC is currently facing. In the cloud-native NewSQL database, using the HLC clock also requires high-precision time synchronization at the physical clock level. However, the HLC protocol based on NTP has a small delay and small error under the LAN, but it is prolonged and unstable under the WAN. For a multi-region distributed cluster, the error is very large, which greatly limits the cloud-native NewSQL database. the amount of concurrency. How to improve the time synchronization accuracy, thereby increasing the read and write concurrency of the cloud-native NewSQL database, has become an urgent problem to be solved.

5. True-time mechanism

The True-time mechanism is a time synchronization mechanism proposed by Google in the Spanner paper. This mechanism assumes that the timestamps generated by each machine in the system have errors, and a consensus on the error range of timestamps is reached in the entire system, thereby delaying the submission of transactions. , the delay time is related to the timestamp error, so each Google data center has deployed a master clock server based on GPS and atomic clocks to ensure that the waiting time for the extension submission is as short as possible. 

True-time essentially ensures that the clock skew between two servers is within a very small range, so it places extremely high requirements on the accuracy of the machine's physical clock. Google synchronizes the time of each machine in the data center by deploying an expensive atomic clock + GPS system in each data center. Compared with the error of a traditional quartz clock in milliseconds or even seconds per day, the accuracy of atomic clocks can reach one second in 20 million years. 

Of course, the shortcomings of this solution are also obvious, because it is a solution implemented in combination with hardware, which is expensive and difficult to promote on a large scale in other places.

ZNBase's atomic clock scheme

ZNBase is a NewSQL cloud-native distributed database open sourced by Inspur. Like CockroachDB and TiDB, it is designed with reference to Google's Spanner+F1 paper. It has strong consistency, high-availability distributed architecture, distributed horizontal expansion, high performance, Enterprise-grade security and more. Its clock synchronization scheme has been optimized and iterated, and the industry-leading ture-time scheme is currently used. 

The clock synchronization scheme originally adopted by ZNBase is also NTP time synchronization + HLC. Due to the error problem of the NTP protocol mentioned above, the error of about 500ms in the nodes in the cluster greatly affects the high concurrency performance of the database. In order to solve this problem, the ZNBase team decided to abandon the HLC module, introduce a higher-precision atomic clock + PTP protocol, and implement its own ture-time solution. 

PTP stands for Precise Time Synchronization Protocol, which is a time synchronization protocol implemented at the hardware level. Generally, hardware timestamps are used, together with a higher-precision delay measurement algorithm than NTP. PTP can directly analyze the PTP protocol packets at the MAC layer, so that it does not go through the UDP protocol stack, reducing the residence time of PTP in the protocol stack, thereby improving the accuracy of time synchronization.

The following is the experimental comparison data of NTP and PTP conducted by the ZNBase team:

Through the stress test TPMC results in different concurrent scenarios, the performance of the atomic clock solution is improved by 10%-20% compared with the performance data of the original solution. Moreover, compared with the original scheme, the atomic clock scheme is more suitable for high concurrency and transaction conflict scenarios.

Since the true-time mechanism of Google Spanner is implemented with the hardware of atomic clock + GPS, the relevant software parts are not open source, and cannot be directly copied or used for reference. The ZNBase team encountered many challenges in the process of implementing its own true-time mechanism. 

In response to this situation, the ZNBase R&D team reviewed a large number of papers and technical materials, studied the technical theory of true-time, conducted a large number of stress tests in the early stage and carried out school-enterprise cooperation, and carried out research on related sub-topics with university scientific research institutions. . Finally, through a lot of theoretical and practical research work, ZNBase implements the existing atomic clock scheme.

At present, the true-time solution of ZNBase is implemented based on the high-precision time synchronization master clock. Support PTP/NTP/GPS/Beidou/Atomic Clock five modes. When using GPS/Beidou as the reference clock, the accuracy of tracking UTC is better than 100ns, and it can provide a time signal source of hundreds of nanoseconds through Ethernet, which greatly reduces the clock offset range between different servers.

High-precision time-synchronized master clock

By optimizing the NTP+HLC scheme into a PTP+atomic clock scheme, ZNBase controls the node error in the cluster within 1ms, which solves the problem of large clock errors between distributed nodes when deployed in different places and multiple centers.

Summarize

This paper introduces several clock synchronization solutions commonly used in distributed database systems, as well as the optimization iterative process of ZNBase's clock synchronization solution. Friends who are interested in related technologies are welcome to leave a message for discussion, and welcome to point out the shortcomings.


More details about ZNBase can be found at:

Official code repository: https://gitee.com/ZNBase/zn-kvs

ZNBase official website: http://www.znbase.com/ 

If you have any questions about related technologies or products, please submit an issue or leave a message in the community for discussion. At the same time, developers who are interested in distributed databases are welcome to participate in the construction of the ZNBase project.

Contact email: [email protected]

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324132624&siteId=291194637