Harp: a distributed transaction optimization algorithm for cross-spatial domains

Harp: a distributed transaction optimization algorithm for cross-spatial domains

Zhuang Qichen1,2,Li Tong1,2,卢卫1,2,  Du Xiaoyong1,2

1 School of Information, Renmin University of China, Beijing 100872

2 Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing 100872

Abstract:The near-data computing paradigm has driven banks and securities firms to build multiple data centers across the country or around the world. In the traditional business model, transactions focus on data access in a single data center. As business models change, distributed transactions across data centers have become the norm, such as transfers between bank accounts, equipment exchange between game accounts, etc., and the data of these accounts are stored in data centers in different regions. Distributed transaction processing requires a two-phase commit protocol to ensure the atomicity of sub-transaction submission by each participating node. In cross-spatial domain scenarios, the network latency between nodes is longer and different, and traditional transaction processing technology needs to be expanded to ensure that the system can provide higher throughput. After analyzing the problems and optimization space of cross-domain transactions, a new distributed transaction processing algorithm Harp is proposed. Under the premise of ensuring the serializable isolation level, Harp delays the execution of some sub-transactions based on differences in network delays, reducing the lock contention time of transactions and improving system concurrency and throughput. Experiments show that under YCSB load, Harp's performance is 1.39 times improved than the traditional algorithm.

Keywords:Distributed transactions across spatial domains; network differences; transaction scheduling; lock contention

f7854383caecfa5c7e7f5c6f4c3625aa.jpeg

Paper citation format:

Zhuang Qiyu, Li Tong, Lu Wei, et al. Harp: Distributed transaction optimization algorithm for cross-spatial domains [J]. Big Data, 2023, 9(4): 16-31.

ZHUANG Q Y, LI T, LU W, et al. Harp: optimization algorithm for cross-domain distributed transactions[J]. Big Data Research, 2023, 9(4): 16-31.

90ec0bf244757bb562aa11bfd3369960.jpeg

0 Preface

At the beginning of 2022, the country completed the overall layout design of the national integrated big data center system, officially launched the "Eastern Data and Western Computation" project, and deployed national computing power hub nodes in eight major regions including Beijing-Tianjin-Hebei, the Yangtze River Delta, and Guangdong, Hong Kong and Macao to build a nationwide Integrated computing power network. Data management is evolving from isolated services oriented or limited to a single spatial domain (single data center) to a collaborative service stage across spatial domains (cross-data centers). Distributed databases are widely used in the management and analysis of large-scale data due to their high scalability, high availability and low cost. In order to ensure close data computing, banks, e-commerce, cloud computing manufacturers, etc. often build multiple data centers. For example, Amazon has built 38 data centers and Apple has built 11 data centers (6 in the United States and 2 in Denmark). , 3 in China). In this way, user data can be stored in the nearest data center. When the read and write operations of a user-initiated transaction are limited to their personal data, near-data computing can ensure high performance for the transaction. However, cross-domain transactions break the assumption of close-range computing. Cross-domain transactions refer to transactions involving users in two different regions. For example, a user in Guangdong needs to transfer 10,000 yuan to a user in Beijing. Since the data of the two users are stored in different data centers, the data between the two users needs to be coordinated.

When processing distributed transactions across spatial domains, database concurrency control will encounter new challenges. The two-phase lock (2PL) protocol is a classic protocol for concurrency control in distributed databases. An important indicator that affects the performance of this protocol is the lock contention span. Lock contention duration refers to the time from when a transaction acquires a lock on a data item during the execution phase to when the transaction releases the lock. Under the influence of network delay, the lock contention time of distributed transactions will be much longer than that of single-machine transactions, which will limit the concurrency of the system and increase the rollback rate of the system. In cross-spatial domain scenarios, distributed transactions may involve multiple nodes in different data centers, with longer network delays and greater differences between them. The execution of a distributed transaction usually requires waiting for the "slowest" sub-transaction (the sub-transaction has a large network delay between the participant node and the coordinator node) to complete before it can be submitted, which will lead to long lock contention times. Further lengthening brings greater challenges to distributed transaction processing across spatial domains.

In order to minimize the lock contention time of distributed transactions across spatial domains while ensuring correctness, this paper proposes a new transaction processing algorithm Harp. Harp follows a two-phase blocking protocol, so the correctness of transactions is guaranteed. In addition, Harp fully considers network differences, adjusts the execution time of sub-transactions based on the real-time evaluated network delay, and delays the execution time of "fast" sub-transactions (the network delay of sub-transactions between participant nodes and coordinator nodes). Smaller), thereby reducing unnecessary lock contention caused by "fast" sub-transactions waiting for "slow" sub-transactions, improving the utilization of lock resources, and improving system concurrency and throughput. Experiments show that in some scenarios, Harp's throughput can reach 2.39 times that of the traditional algorithm, while the rollback rate is reduced by 32%.

1 From single space domain distributed transactions to cross space domain distributed transactions

1.1 Single space domain distributed transaction

A transaction is a sequence of user-defined database operations and is the smallest execution unit in a database management system. A distributed transaction refers to a transaction that includes operations on data items on different nodes. When the server receives a distributed transaction, it is usually handed over to a coordinator.

In the execution phase, the coordinator splits the transaction into multiple sub-transactions and distributes them to various participants, who execute the transaction according to the concurrency control algorithm. For example, if 2PL is used as the concurrency control protocol, participants will lock data items during the execution phase. This is suitable for high-conflict scenarios, but will reduce the concurrency of the system, and deadlock issues need to be considered. If the optimistic concurrency control (OCC) protocol is adopted, participants do not need to lock data items during the execution phase and verify the data items before submission, which is suitable for low-conflict scenarios. After the execution phase ends, distributed databases usually use the two-phase commit (2PC) protocol to ensure the consistency of the database. The 2PC protocol divides transaction submission into two stages. In the preparation phase, the coordinator collects the execution status of each participant and determines the final status of the transaction based on the execution results of each sub-transaction. If all participants can submit the transaction, the coordinator will set the transaction status to "commit" and notify each participant; if any participant cannot submit, the coordinator will set the transaction status to "rollback" ), and notify all participants to roll back the sub-transaction.

Next, an example will be used to more intuitively introduce the execution method of distributed transactions in the data center. For example, transaction T1 can be divided into 3 sub-transactions T11 ~T13, respectively access partition N1~N1 17>12 and T13, after 1 round trip time (RTT) during the execution phase After the execution is completed. After another RTT preparation phase, the coordinator obtained the execution results of each sub-transaction. In the commit phase, the coordinator determines the final status of the transaction (commit/rollback) based on the execution results of each sub-transaction, and sends the final status of the transaction to the participants. The participant releases the lock after receiving the end message. Because N1→N2 and N1→N3 The network difference is small, so T12 and T13 are basically synchronized. In Figure 1, represents the lock contention time on the data item. During this contention time, other transactions that want to operate on this data item and have conflicting lock types will be blocked. Block or rollback. 's coordinator. If the 2PL protocol and the 2PC protocol are used, the execution of the transaction is shown in Figure 1. The sub-transaction T1 serves as T1, while N3aa04fc96bdea0296907eaecaed6da4b7.png

2406610418b460bc07d715790379d69e.jpeg

Figure 1 Single space domain distributed transaction execution process

1.2 Distributed transactions across spatial domains

In recent years, concepts such as "three centers in two places" and "five centers in three places" have been proposed, which means that the database will handle more and more distributed transactions across spatial domains. Figure 2 gives an example of "three places and five centers". Two data centers are deployed in New York and Beijing, and one data center is deployed in London. This architectural design meets the needs of large-scale data growth, and at the same time Guaranteed higher availability.

2e139b837d6892eced29a240b82ec158.jpeg

Figure 2 Schematic diagram of “Three Places and Five Centers”

In distributed transaction processing, multiple rounds of network scheduling need to be introduced, including two-phase commit, copy synchronization, etc. In the transition from intra-data center to cross-data center, the distance between regions determines that cross-spatial domain network transmission has a higher basic delay. RDMA-based high-speed network technology (InfiniBand (IB) or RoCE v2) can be used inside the data center to achieve stable and low latency. However, the end-to-end transmission and medium sharing characteristics of WAN data bring uncertainty to data transmission latency. . As shown in Figure 3, in cross-space domain networks, communication delays are usually 10 ms or higher, while network delays in data centers are usually microseconds, and IB dedicated networks can even reach ten microseconds. Compared with the internal data center, the network delay between cross-spatial domain nodes cannot be ignored. Therefore, network transmission between cross-spatial domain nodes will become a bottleneck restricting the performance of the database system.

823419f4532deb000785a1be9dcb6632.jpeg

Figure 3 Network latency in different scenarios

Considering that the communication delay of cross-space domain network is much larger than that within the data center, and distributed transactions need to wait for all sub-transactions to be completed before the final status of the transaction can be determined, then cross-space domain distributed transactions are compared with data Distributed transactions in the center will further increase the delay. In the cross-spatial domain scenario, the execution of the example in Figure 1 will change. As shown in Figure 4, assume that the RTT of N1→N3 Much larger than the RTT of N1→N2, in sub-transaction T 12After the execution is completed, you need to wait for T13 to return the result before it can be determined The final state of T1. The execution result of T13 requires a long network delay before it can be processed by N1Know, this introduces T11 and T12 Unnecessary lock contention duration. In high contention scenarios, the number of blocked or rolled back transactions will also increase.

0a225da1463ecafa96e39dad6e7c90cd.jpeg

Figure 4 Cross-data center distributed transaction execution process

2 Challenges of distributed transactions across spatial domains

Section 1 takes a single transaction scenario as an example to preliminarily discuss the impact of network differences on distributed transactions in a cross-spatial domain scenario. Next, the problem is extended to a more general multi-transaction scenario to analyze the problem and optimization space of distributed transactions across spatial domains.

2.1 Problem description

Distributed databases divide data into different partitions. In order to ensure near-data computing, user data is usually deployed in data centers closer to users. In Section 1, the author introduced that distributed transactions require the 2PC protocol to ensure the atomicity and isolation of transactions. In the Early-Prepare method, the participant nodes directly enter the preparation phase after the execution phase without waiting for the coordinator. notification message. Therefore, the lock contention duration can be simplified from the two rounds of network communication in Figures 3 and 4 to one round of network communication, that is, the transaction locks the data items during the execution phase and completes the work in the original preparation phase. After one round of network communication After receiving the transaction status information from the coordinator, it unlocks the data item and commits (rolls back) the transaction. This article will analyze and optimize cross-spatial domain distributed transactions under this execution model. Next, an example is used to more clearly describe the problems faced by distributed transaction processing across spatial domains. As shown in Figure 5, there are three database nodes N1, N2 and N3, where N1 and N and N and N. 3, data items u and v are located at N2, and data item z is located at N1 is 25 ms. Data items x and y are located at N32 is 30 ms, N31 is 10 ms, N2

N1The production partner came to the customer and asked for 4 items, among which T1 涉到N1~N33 points, its operation R1(x)W1(y)W1(z)R1(v) W1(u); T2sum T 3涉到N1和N2 2(x). 4Personal Affairs , its operation W4(z); T3(z); (x) R2(v)W2(x)Woperation order 为R2two points, among which T

As shown in Figure 6, the database follows the 2PL protocol and 2PC protocol to schedule the above instances. The coordinator divides T1 into sub-transactions T11 according to the nodes involved in the operation. ~T12 and send them to the corresponding participants (Message 1 and Message 2). When executing T2~T4, they will also be split Divide into multiple sub-transactions and send to participants (for a clearer picture, T2~TDistribute sub-transaction messages). Next, when executing T11, the transaction obtains the shared lock and exclusive lock on data items x and y respectively. Similarly, T12 and T13 obtain the corresponding lock and return the execution results to the coordinator (message 3 and message 4). Next, when executing T2~T4, you will find that The lock on the data item to be acquired is already occupied by T1, and the lock type conflicts, then T2 ~T4 will be T1Block until T1 releases the lock.

1a126c664d39475f3003088ac88eb6dd.jpeg

Figure 6 Execution logic of 2PL protocol + 2PC protocol

After 60 ms, the coordinator received the execution result (message 4) returned by T13, and determined T based on the sub-transaction execution result. The final status of 1 is sent to each participant node (message 5). Participant N2 receives the message 10 ms later, after modifying T1 releases the lock it acquired after entering the transaction status, and removes T22 from the waiting queue of the lock to continue execution. After T22 is executed, send T1 The execution result of a> each The sum of the lock contention times for data items is 300 ms, 60 ms, 40 ms and 0 ms respectively. 4~T14~T1 is similar, as you can see, When using 2PL protocol and 2PC protocol for transaction scheduling, T4 and T3 can be determined and notified to each participant (message 7). The execution logic of T2 , the final status of T2After receiving the execution results of all sub-transactions of T1 (message 6). N22

2.2 Optimization space

If the data distribution is not changed through the copy migration method, the lock contention duration of the distributed transaction is at least RTT (the participants notify the coordinator after the transaction is completed, and the coordinator can immediately determine the transaction status after receiving the message. status and notify participants to release the lock). In Figure 6, subtransactions T11 and T12 are executing After the transaction is completed, the final status of the transaction cannot be determined until the execution results of T13. During this period, with T11 and T12 in the data Transactions with conflicts on items will be blocked or rolled back, which limits the concurrency of the system. This article refers to the situation where the lock contention duration is equal to the network RTT as "necessary lock contention"; if the lock contention duration is longer than the network RTT, it is considered to have "invalid lock contention".

In an ideal situation, there is no "invalid lock contention" in the lock contention duration of all sub-transactions, that is, the lock contention duration of a transaction on a single data item is equal to the RTT of the corresponding network. Then, the lock contention for each data item in transactions T1~T4 The sum of the time taken is 80 ms, 20 ms, 20 ms and 0 ms respectively. It is not difficult to see that this is quite different from the implementation in actual scenarios. As network latency and variability increase, this phenomenon will become more prominent.

In cross-spatial domain distributed transactions, the delays between participant nodes are greatly different, and a large number of "invalid lock contention" phenomena will occur. In order to alleviate the impact of cross-spatial domain distributed transactions on system performance, this paper proposes a new transaction processing algorithm Harp without affecting the correctness. By delaying the execution of sub-transactions, it reduces the lock contention time and improves the database System throughput.

3 Harp: Cross-spatial domain distributed transaction optimization algorithm

3.1 Algorithm description

In the example of Figure 6, subtransactions T11 and T12 is executed at 0 ms and 20 ms respectively, but it needs to wait until T13 returns the result before entering the submission phase. T1112 will not reduce the execution time of transaction T1 Delay will increase the contention time on data items. Assuming that the execution time of local transactions is much smaller than the network delay between nodes across spatial domains, by delaying the execution of a part of sub-transactions, the lock contention time of the transaction can be effectively reduced. However, the delay time of sub-transactions is not without constraints. Taking T1 as an example, if T12 will be extended from the original 60 ms to 70 ms, which is something Harp does not want to happen. 1 is delayed to be executed at 50 ms, then it will return N1 at 70 ms. In this case, the time to determine the transaction status of T

Harp hopes to minimize the lock contention time of transactions. For transactions, the lock contention duration can be expressed as:

28c97dd5fdfaed0f3ac1f13c90983b54.png

Among them, ki is the data item that T needs to operate, and f7c75f4676fa6c867c6d7a2122f81caa.png is the data item that T needs to operate. The time when an item is locked and unlocked.

In addition, Harp modified the locking logic. During the locking process of remote transaction TI, it is related to the currently locked transaction TJ< a i=4>When there is a conflict, if TJ is a local transaction, then TI< a i=8> can wait, otherwise TI will be rolled back. Since the execution of local transactions is usually faster, after modifying the locking logic, the remote transaction can be executed quickly and the execution results are returned to the coordinator.

According to the 2PC protocol, the final status of the transaction can be determined only after the coordinator receives the execution results of all sub-transactions, that is, the execution delay of the transaction depends on the slowest sub-transaction. If 0e9c69bf0bbfd84d8a55ea94fc35865c.png is used to represent coordinator Nc and participant Nj is located /span>) can be used< /span>. Equation (1) can be rewritten as:) can be expressed by i is expressed, while UnLock(ki delayed sending time, then Lock(kii) to represent the sub-transaction t< where ki27ec4614c5879d4d1e72810e45859953.png. Use DelayTime(k, according to Harp’s locking logic, the execution time of T can be approximately expressed as 7553bcc61433a3f43ce01c97409d4ced.png82b0e83008fedc1935c249066c9154c8.png

d732e176362ee428440d5fdb7b166208.png

In addition, Harp does not want the delayed execution of sub-transactions to cause the execution delay of the transaction to be greater than 38a8a4593f52470d8f8e21d83244695e.png, so each sub-transaction should return results before 06decb3a18da29cb2b671113bac6a275.png. Combining Equation (2), Harp’s optimization objective and constraint equations can be written:

85b0eae38cf5a41cbbc4b9ac39d47f83.png

Sinceaf5c2dc5e24bcf15c4bc0f5bba7c8ff2.png is a constant, equation (3) can be further rewritten into equation (4):

eac038cb0226aa311aecfb31f53ae5ee.png

Substitute the constraint equation into the objective function to obtain equation (5):

03023884428583381bd89bbc753d1b24.png

It can be seen that when the equality condition of equation (5) is satisfied, the objective function obtains the maximum value. Assume that data item ki belongs to sub-transaction ti, then the delay The time should satisfy:

6da327b52890973d43f4c0a52d8dbf05.png

Before the transaction enters the execution stage, Harp adjusts the sending time of the sub-transaction according to Equation (6). The adjustment algorithm is shown in Algorithm 1. The time complexity of this algorithm is O(n).

Algorithm 1: Adjust the sending time of sub-transactions

Input: G: network delay graph; T: transaction

Output: Subtransaction send time

1.Function Calculate Send Time (G, T)

2.max_latentcy←0, start_time←get_sys_clock()

3.for ti∈T.subtxn do

4.j←get_node (ti)

5.ti.lat←G[c][j]

6.max_latentcy←max{max_latentcy, ti.lat}

7.for ti∈T.subtxn do

8.ti.send_time←start_time+2·(max_latentcy-ti.lat)

Under Harp scheduling, the execution process of the instance in Figure 6 will change. As shown in Figure 7, before the transaction is executed, T12 and T13< are calculated through Algorithm 1 The sending times of /span> will be executed at 60 ms, T can be within 60 ms Finished. 4~T1No adjustment is made for stand-alone transactions. After adjustment, T4 are executed with a delay of 20 ms, T31 and T2111 are 40 ms and 0 ms respectively, T

d40e13f9a64b275af497e49c0492bf0d.jpeg

Figure 7 Optimized execution logic

At 0 ms, T1~T3The delayed execution of subtransactions (including local subtransactions) causes T4 to execute W4 When (x), there is no lock on data item x, so the execution of T4 will not be blocked and can be submitted immediately . At the same time, N1 will send sub-transaction T21 to N< /span>2 at the same time =28>, and based on the execution results of T21 and T22 Determine the transaction status and send the transaction status to N2 (message 4). After 10 ms, N2 receives the message. After modifying the transaction status of T2, the lock on data item u is released. At this time, T32 acquires the lock of data item u and returns the execution result (message 5). At this point, T2~T4 is executed within 40 ms. At this time, the coordinator N1 sends sub-transaction T to N2 12 (Message 6). At 60 ms, N1 executes local sub-transaction T11, and received the execution results of T12 and T13 (message 7 and message 8). The final status of T1 is determined based on the execution results of each sub-transaction. See Table 1. Under the scheduling of 2PL protocol and 2PC protocol, T2~T4 and T can all be executed in a shorter time. 4 /span>1The lock contention duration on data items x, y, and z is such that T11Delayed execution reduces T< without affecting T1211’s subtransactions T1 is executed. Harp separates T1 needs to be blocked until T21 Return the result (message 3), and execute the local sub-transaction T22 receives subtransaction T1 (Message 2). At 20 ms, N3 sent to N13 (Message 1), T2

ef0f7e33d6159bfe06904da73aeedcdf.png

At the same time, Harp also adjusted the relevant logic of the database system for sending messages, as shown in Figure 8. When there is a message in the queue that needs to be processed, the message is taken out from the head of the message queue and compared with the current system time and the message sending time. If the current system time is greater than the message sending time, it means that the sub-transaction needs to be executed and the message is sent immediately; otherwise, the execution of the sub-transaction contained in the message can still be postponed, pushed to the end of the queue, and other messages in the queue continue to be processed. In order to further optimize the lock contention duration, when receiving messages, messages involving lock operations are given a higher priority, and these messages are first assigned to worker threads for processing.

2816fc888b36515397bf8ba650176f79.jpeg

Figure 8 Message sending process

3.2 Correctness analysis

Serializability is the golden rule to ensure correct scheduling of transactions. Harp will adjust the order of sub-transaction execution based on network differences and optimize the lock contention duration of transactions, but the locking method still follows the two-stage blocking protocol. At the same time, Harp uses the No-Wait and Wait-Die strategies to prevent deadlocks and avoid dependency loops. As we all know, scheduling following the two-phase blocking protocol satisfies conflict serializability, so using Harp for transaction scheduling can still achieve a serializable isolation level.

3.3 Feasibility analysis

In cross-spatial domain scenarios, network delays and differences between nodes are relatively large. Harp delays the execution of some sub-transactions based on the distribution of data items involved in the operation and network characteristics without affecting transaction delays, thereby shortening The lock contention time of hotspot data improves the concurrency of the system. The biggest difference between Harp and deterministic databases is that Harp does not require batch processing of transactions. In addition, Harp can still be used to optimize stored procedures without logical predicates, single-statement interactive transactions, and the first statement of multi-statement interactive transactions.

In addition, the DTP model is also the method for handling cross-domain transactions in current database applications. Transaction scheduling can be implemented at multiple levels, including at the database kernel level, middleware level, and client level. In the DTP model, the client application connects to each database through middleware, and completes 2PC/3PC through the transaction manager (TM) to ensure the atomicity and independence of transactions. In a distributed database, the coordinator node is responsible for scheduling transactions, while in the DTP model, the middleware is responsible for coordinating transactions in each database. The middleware uses this article based on sensing the network delay difference of each target node. The algorithm idea determines the delay time of sub-transactions and delays sending sub-transactions, which can also achieve the effect of reducing lock contention time and improving system concurrency. Therefore, the algorithm idea in this article is still valid in the DTP model.

3.4 Targeted discussion

The goal of Harp is to use network latency differences to delay the execution time of some sub-transactions and reduce the lock contention time of hot data in cross-spatial domain scenarios. However, Harp's delayed sub-transaction may not be returned immediately because the data items are occupied by other concurrent distributed transactions, thus failing to achieve the expected effect of the algorithm. The current work will be rolled back directly when encountering this situation to avoid waiting for distributed transactions. This is a more radical method. In subsequent work, meta information can be added to record historical data, predict the delay between each sub-transaction being processed by the worker thread and acquiring the lock, and modify equation (3).

In addition to the pessimistic concurrency control algorithm 2PL, optimistic concurrency control algorithm (OCC), multi-version concurrency control (MVCC), etc. are commonly used concurrency control algorithms in database systems, and they will also encounter problems in cross-spatial domain scenarios. Problems such as poor performance and high rollback rate were encountered. MVCC can only support snapshot isolation level, and it can achieve serializable isolation level only when combined with 2PL, OCC, etc., such as MV2PL, MVOCC, etc. In order to ensure the atomicity and isolation of the transaction, OCC will verify the read-write set of the transaction before committing the transaction. If other concurrent transactions modify related data items during this period, the transaction will fail to verify and be rolled back. In a cross-spatial domain scenario, the network delay between nodes is long, and a transaction needs to go through at least one round of network communication from accessing a data item to verifying the data item. During this period, the possibility of the data item being modified becomes greater, thus The rollback rate of the system will increase. Using Harp's technical ideas, the execution of some sub-transactions can be delayed based on the network delay difference, and the time interval from execution to verification of the corresponding sub-transaction can be shortened, thereby reducing the possibility of data items being modified by other concurrent transactions and reducing system responses. Roll rate.

In addition, the overhead of replica consistency in a multi-replica system is in the 2PC protocol, while Harp's optimization is targeted at the execution phase of the transaction. Therefore, Harp's optimization is orthogonal to replica consistency and can also be applied to multi-copy systems. In the traditional 2PC+Paxos transaction execution logic, Harp can still delay the locking time in the execution phase, thereby shortening the lock contention time of the transaction. Harp is equally effective if transaction execution logic that adjusts the execution phase such as TAPIR and G-PAC is used. During the execution phase, TAPIR will send read and write operations to each replica and perform concurrency control in each replica. In this scenario, before Harp sends read and write operations to each replica, it will consider the delay from the coordinator node to each replica, and substitute it into Algorithm 1 to obtain the delay time of each sub-transaction. By delaying the execution of the sub-transaction, it still It can reduce the lock contention time of transactions and improve the concurrency of the system.

4 Experimental evaluation

4.1 Environmental testing

This article uses C++ language to implement Harp in the distributed transaction testing platform Deneva [21], and compares the performance differences between algorithms under the same experimental environment and experimental load. A total of 8 servers participated in the experiment. The parameters of each server are shown in Table 2. Among them, 4 servers are used as clients and the other 4 are used as servers. The clients are responsible for generating transactions and sending them to the server for execution. By adding network delay to the network card, the cross-spatial domain scenario is simulated. Four servers are divided into 2 groups to simulate the "two places and three centers" scenario. The network delay between cities (cross-country) is set to 300 ms. Different data in the same city The network delay between centers is set to 25 ms, and the network delay within the data center is set to 10 ms.

4e2b0c8ee0b9c4dd1831bad4e467ddaf.png

This article conducts performance testing based on YCSB load. YCSB is a comprehensive benchmark that simulates large-scale Internet applications. Its data set is a relationship containing 10 attributes, one of which is the primary key. In this relationship, each record size is approximately 1 KB and the data set is distributed on different nodes through horizontal partitioning. 2PL is a classic algorithm for pessimistic concurrency control. It can use the Wait-Die strategy to avoid deadlock problems. It is widely used in database products, such as MySQL, OceanBase, etc. Therefore, the experiments of this article choose to perform performance tests on 2PL and Harp on YCSB.

4.2 Performance testing

First, the author tested the impact of the skew factor on throughput and rollback rate, and plotted the results in Figure 9. The larger the skew factor, the more serious the contention for hotspot data. It can be seen that Harp performs better than Wait-Die in all tested scenarios. Through calculation, it was found that Harp's throughput increased by 1.36 times compared to 2PL when the tilt factor was 0.8, while the rollback rate decreased by 11%.

a915c4e5b66e477f9bc2aa6a8cc578ed.jpeg

Figure 9 Performance test with different tilt factors

Next, the author tested the performance of Harp and 2PL under different read-write ratios when the proportion of distributed transactions was fixed at 50%. The results are shown in Figure 10. As the proportion of write operations increases, both Harp and 2PL will encounter the problem of decreased throughput and increased rollback rate. It can be seen that the performance degradation of Harp is relatively mild. When the write-read ratio is 0, the transactions processed by the system are all read-only transactions. The 2PL algorithm requires shared locks for read operations, so there will be no transaction conflicts in this scenario. , the performance of the two is close. As the proportion of read-write transactions increases, Harp's performance advantage gradually becomes obvious. When the write-read ratio is 100%, that is, when all write transactions are performed, Harp's performance reaches 2.39 times that of 2PL, and the rollback rate is also much lower than Wait -Die.

dff3f56359f4664c504361f209a7c518.jpeg

Figure 10 Performance test with different proportions of write transactions

Finally, under the medium contention scenario (θ=0.6), the throughput and rollback rate of Harp and 2PL in various distributed transaction ratios were tested, and the results are shown in Figure 11. Because Harp does not optimize single-machine transactions, 2PL's performance is slightly better than Harp when processing single-machine transaction load (distributed transactions account for 0), which shows that Harp's additional overhead is small (2.7%). As the proportion of distributed transactions increases, the performance of both algorithms decreases, but Harp's throughput has always been higher than 2PL. When processing distributed transactions across spatial domains, Harp delays the execution of sub-transactions based on network differences, shortening lock contention time. In the experiment, the author counted the average lock contention time of data items. Experimental results show that Harp's lock contention time is much lower than that of 2PL. When the proportion of distributed transactions is 100%, Harp's average lock contention time is 26.8% of the average lock contention time of 2PL.

9adbe7d380b1406684952b99dfd9ad53.jpeg

Figure 11 Performance test of different distributed transaction proportions

Harp's performance improvement benefits from three aspects: ① Harp delays the execution of some sub-transactions based on network differences, which can shorten the lock contention time of transactions, improve the concurrency of the system, and fully utilize lock resources; ② Harp The locking logic has been modified so that remote transactions will not wait for each other. Although unnecessary rollbacks will occur, excessive waiting times for transactions have been avoided, and experiments have shown that Harp's rollback rate is low in a variety of scenarios. In 2PL; ③Harp improved the message queue so that messages involving lock operations can be responded to faster, further shortening the lock contention time of transactions.

5 related work

Distributed concurrency control algorithms can generally be divided into static sequencing algorithms and dynamic sequencing algorithms. The order of transactions is determined statically in a static ordering algorithm. The main idea of ​​OCC is to check whether the data items have been modified or can be modified when the transaction is submitted. The timestamp entering the verification phase is usually used as the basis to determine the order. Slio proposed by Tu et al. is a variant of OCC. Silo maintains the read set and write set of transactions. During the verification phase, it determines whether there is a conflict by detecting whether its own read set has been modified by other transactions. Pessimistic concurrency control algorithms, such as 2PL, order transactions by the order of locks granted on conflicting data items. In addition, Calvin, Aria, and Caracal proposed by Thomson et al. are deterministic concurrency control algorithms. In Calvin, transactions are processed in batches, and transactions within a period of time are statically sequenced and then locked according to the order of transactions. The dynamic sequencing algorithm assigns a timestamp interval to each transaction and selects the submission timestamp in the interval according to certain rules. When a transaction is committed, the transaction timestamp interval related to it is dynamically updated and adjusted. Boksenbaum et al. first used dynamic timestamp adjustment in distributed concurrency control. In recent years, there has been a lot of work around dynamic timestamp adjustment algorithms, such as MaaT, Sundial, and TicToc. Among them, MaaT uses additional metadata and transaction queues to maintain access traces of data items, and records the dependencies between concurrent transactions. By adjusting the upper and lower bounds of the transaction's logical timestamp interval, a reasonable submission interval is selected. This dynamically determines the order between transactions. In recent years, in order to make full use of the advantages of various concurrency controls, Liu et al. proposed a hybrid concurrency control algorithm that combines deterministic concurrency control and non-deterministic concurrency control based on the Actor model.

In high contention scenarios, there are a large number of conflicts and dependencies between concurrent transactions. In order to ensure the ACID principle of transactions and avoid data anomalies, a large number of transactions will be rolled back. The industry has also proposed some solutions to optimize the lock contention duration in high contention scenarios. Faleiro et al. proposed lazy execution scheme and early write visibility to reduce data contention in these systems. Guo et al. broke the constraints of the two-phase lock protocol in Bamboo and conditionally released locks during transaction execution, reducing contention and improving transaction execution efficiency. Li et al. proposed SwitchTx, which decentralizes some transaction logic to programmable switches to reduce network scheduling overhead and data contention. In addition, Zamanian et al. proposed Chiller, which analyzes the transaction dependency graph to obtain hotspot data before transaction execution, and places the hotspot data locally to reduce lock contention for hotspot data.

In cross-spatial domain scenarios, the database partitions data items to achieve scalability and utilizes multi-copy technology to achieve high availability. When processing distributed transactions that access multiple partitions, Megastore, Spanner, and CockroachDB need to introduce multiple rounds of network communication to complete transaction execution, 2PC, log synchronization, and transaction state replication. Multiple rounds of communication in the WAN result in high transaction latency. In order to shorten the delay, MDCC, RedT and TAPIR etc. perform 2PC and log synchronization in parallel. Natto proposed by Yang et al. takes into account cross-spatial domain network differences. After estimating the time when a transaction reaches the farthest participant, it allocates a timestamp based on that time to reduce the delay of high-priority transactions. In addition, there is a large amount of work focusing on optimizing data distribution across spatial domains. CLOCC uses caching on the client side. However, client-side caching is limited, but some workloads require large amounts of caching, and maintaining cache coherence introduces additional overhead. Cloud SQL Server avoids 2PC by limiting transactions to access data on a single node. In addition, Akkio moves data between data centers to increase data locality when workload changes, but it does not provide transaction guarantees.

6 Conclusion

In cross-spatial domain scenarios, the network delays between nodes are longer and different, and traditional transaction processing algorithms will encounter new challenges. After analyzing the problems and optimization space of distributed transactions, this paper proposes a distributed transaction optimization algorithm Harp for cross-space domains. Harp delays the execution of some sub-transactions based on the differences in the network, reduces the lock contention time of transactions, and improves the utilization of lock resources, significantly improving the performance of distributed databases. As a preliminary study on cross-spatial domain distributed transaction optimization, Harp can provide certain reference value for subsequent cross-domain data management research.

About the Author

Zhuang Qiyu (2000-), male, is a doctoral candidate at the School of Information, Renmin University of China. His main research directions are distributed database systems and transaction processing.

Li Tong (1989-), male, Ph.D., associate professor at the School of Information, Renmin University of China. His main research directions are new generation Internet architecture, cross-domain data management and big data.

Lu Wei (1981-), male, Ph.D., is a professor and doctoral supervisor at the School of Information, Renmin University of China, and a member of the Database Professional Committee of the China Computer Federation. His main research directions include basic database theory, big data system development, query processing in the context of time and space, and Cloud database systems and applications.

Du Xiaoyong (1963-), male, Ph.D., is a second-level professor and doctoral supervisor in the School of Information, Renmin University of China. His main research directions are database systems, big data management and analysis, and intelligent information retrieval.

contact us:

Tel: 010-81055490

       010-81055534

       010-81055448

E-mail:[email protected] 

http://www.infocomm-journal.com/bdr

http://www.j-bigdataresearch.com.cn/

Travel, collaboration:010-81055307

Big data journal

"Big Data Research (BDR)" bimonthly is a journal supervised by the Ministry of Industry and Information Technology of the People's Republic of China, sponsored by the People's Posts and Telecommunications Publishing House, under the academic guidance of the Big Data Expert Committee of the China Computer Society, and published by Beijing Xintong Media Co., Ltd. , has been successfully selected into the core journals of Chinese science and technology, the Journal of the China Computer Federation, the recommended Chinese science and technology journals by the China Computer Federation, as well as the hierarchical catalog of high-quality scientific and technological journals in the field of information and communication, the hierarchical catalog of high-quality scientific and technological journals in the field of computing, and has been rated as a national The most popular journals in the "Comprehensive Humanities and Social Sciences" discipline of the Academic Journal Database of the Philosophy and Social Sciences Documentation Center.

b923944a8e92debc2077b17c17580e2b.jpeg

Follow the WeChat official account of "Big Data" journal to get more content

Guess you like

Origin blog.csdn.net/weixin_45585364/article/details/132867544