GreenPlum Concurrency Management

Concurrency management involves two aspects

On the one hand, it is to ensure the consistency of concurrent transactions; on the other hand, it is to effectively manage resource queues.

Greenplum concurrency and transaction management is based on PostgreSQL single-node MVCC (Multi-Version Concurrency Control, multi-version concurrency control) and snapshot features, and has developed its own and complete distributed transaction management mechanism. PostgreSQL assigns a corresponding transaction ID to each transaction, and attaches a hidden execution transaction ID to the data row of the operation .

For example, when a row is modified, a new version of the row will be created, and a system snapshot will be taken when a new transaction or statement is executed. The snapshot contains the status and start time of the transaction in the system. The following operations will obtain and manipulate data rows based on this information, thereby efficiently completing concurrent additions, deletions, and changes.

Greenplum introduces global distributed transactions on the basis of the above. It coordinates and synchronizes the work of each segment node by obtaining a global snapshot on the Master node, and ensures the consistent state of transactions on all nodes through two-phase commit .

The role of resource queues can be seen everywhere in real life. In front of a limited counter in a bank or hotel, if everyone squeezes in, every customer may not be satisfied with the service, but if there is a well-ordered queue, everyone comes to the counter one by one to handle the business, then the user experience will be better Great. In the same way, in the face of limited database resources, if the instantaneous user submits too many sentences, the system resources will be divided too finely, and the operation of each user will become very slow. Then the resource queue can solve this problem well. . When creating a resource group, by specifying the number of concurrency, the maximum number of statements that can be executed concurrently in the resource group at the same time is limited. Other requests need to be queued to wait for the end of any running statement to be awakened.

Resource queues are more commonly used in analytical scenarios. Some analytical statements require more resources. Limiting the resource queue allows each statement to be completed quickly in turn, instead of each statement being completed slowly and almost simultaneously. Even for transactional statements, the queue limit is very helpful, and can prevent the system from being unstable due to a large number of concurrent accesses. The queue limit provides a fair mechanism, so that the queued queries are released in the order in which they are submitted. Some users are worried that distributed systems such as Greenplum introduce distributed transactions, and whether they will have high latency due to the overhead of distributed transactions when processing short transaction-type statements. Recently, the Greenplum community analyzed this part of the code logic, optimized lock management by reducing conflicting scenarios of transaction locks, and incorporated some improvements in subsequent versions of PostgreSQL, such as greatly reducing the cost of acquiring locks and combining multiple transactions. Commit and merge into a group to write transaction log at a time, etc. After a series of optimizations, Greenplum is very handy when dealing with some short queries. We compared the stand-alone PostgreSQL 10 and the four-node Greenplum 6-dev, and executed simple transactional query statements under the same 120 concurrent conditions. Greenplum did not Not lost to PostgreSQL. For example, when the index is not used, Greenplum's TPS can reach four times that of a stand-alone PostgreSQL. As shown in the figure below, the abscissa represents the number of concurrency, the ordinate represents TPS, the black line is the PostgreSQL data, and the gray line is the Greenplum TPS data using the resource group. You can see that in this case, Greenplum has very good performance

Guess you like

Origin blog.csdn.net/MyySophia/article/details/113796900