Local message table solution for distributed transactions (practical case of cross-region transfer)

1. Preamble

1. Current situation

Recently, I am working on a cross-region transfer function. Let me talk about the current situation of the problem. The company's business scope is mainly distributed in Singapore, Hong Kong and Dubai. Related transactions, cards, accounts and other data must be physically separated by region according to the supervision and compliance requirements of each region. Regarding the sub-database, we chose the middleware Sharding-Proxy, and the shard key is the area code of a certain region, and all the shard tables will carry the field of the area code.
insert image description here

If it is transfer, account transfer, transaction record reading and writing in the same region, all database requests will be routed Sharding-Proxyto the physical database of the region with the local region code, so there is no problem in synchronizing transfers in the same region and running them in one transaction .

2. Problem

Problems arise when transferring money across regions, such as from Hong Kong to Dubai, or from Singapore to Dubai. Although Sharding-Proxyit supports distributed transactions itself, there is no practical experience in distributed transactions across physical libraries before.

At the same time, because the Dubai area is far away from Singapore and Hong Kong, the transmission time will be relatively long even if the dedicated line network is used, and the transfer business involves the reading and writing of multiple tables in two different areas, and the credit account will be given when the balance changes. In addition 数据库行锁(exclusive lock), it is easy to time out in large transactions, and the slow response of the interface will also affect the user experience.

Therefore, we decided to separate the account-out and account-in, and the account-in is MQprocessed asynchronously.

At this time, the problem comes again. Since the local transaction guarantee of the database cannot be guaranteed ACID, how to solve the problem of distributed transactions?


2. Program exploration

There are open source distributed transaction solutions Seata, which provide AT , TCC , SAGA and XA transaction modes. However, this solution is relatively heavyweight, opaque to the outside world, and it needs to be deployed seata-server, which increases maintenance costs.

In the end, I decided to choose a relatively lightweight solution, MQ+ 本地消息表+ 重试补发, and the general process is as follows.
insert image description here

First look at the billing party, the billing record , the actual billing , and the local message table are all in the same transaction. As long as the billing is successful, there must be transfer-related messages in the local message table.

Look at the account holder again. After the account is successfully entered, ack_queuea successful account confirmation message is sent to the account.

If the consumption is successful and the entry is successful, the message in the local message table will be logically deleted and the status will be set to deleted.

If the message fails to be sent or the consumption fails to enter the account, the status of the message in the local message table will always be undeletedstatus. At this time, the billing party will have a compensation timing task to poll the status of the undeletedbilling message in the local message table. In order to reduce the backlog of messages, after a certain number of retries, stop sending messages to MQ, and send an alarm email to the developer for processing.

Note: It needs to be done when the account holder consumes the message 幂等, otherwise it will be recorded repeatedly, which can be solved by adding a distributed lock here.


3. Adjust according to actual business

In fact, our system is more complicated. The ordinary transfer is from account A to account B, and our transfer A account may also share the balance of the enterprise account. Cross-regional transfers are even more complicated. Each region will have a corporate account. The flow of funds changes is as follows:
Fund change process for cross-regional transfers

If personal account A in Hong Kong is transferred to personal account B in Dubai , and personal account A and personal account B share the balances of corporate accounts CA and CB respectively , then the complete account fund change process is as follows:

  1. The corporate account CA is automatically transferred to the personal account A.
  2. The personal account A is transferred to the public account PA .
  3. The public account PB is transferred to the personal account B.
  4. Personal account B is automatically transferred to corporate account CB .

1. The timing compensation scan table is changed to scan cache

If the local message table is directly polled, since our local message table is a broadcast table (PS: the physical databases in each region will have the same table data), the physical databases in each region will be randomly queried when querying. If the random query is Dubai physical If the records in the library's local message table are not recorded, the query will be slower.

Therefore, after writing the local message table, we will write the records of the local message table to Redis at the same time, and the follow-up timing task compensation will directly scan the cache instead of scanning the table.

At the same time, after the entry consumer successfully processes the entry, in addition to logically deleting the records in the local message table (setting deleted), the records in the Redis cache must also be deleted.

Problem: There may be cache inconsistencies when writing the local message table and writing the Redis cache. If the local message table is written successfully, but the Redis cache write fails, then the record will be lost when the scheduled task compensates and scans the cache. Here Cache coherency needs to be guaranteed.

2. Asynchronous processing of payment

It can be seen that when the account is issued, from the corporate balance shared account CApersonal account Acorporate account PA , there will be balance changes and account withdrawal records of 3 accounts in the middle. If it is returned synchronously, the response time will be longer and will also Affecting the user experience, the payment operation here can be asynchronous through threads.

3. Remove ack_queue

At present, the system's billing party and billing party are not cross-system, and the database is not vertically split according to the business. Therefore, after the consumer successfully enters the account, he can remove the confirmation message and directly operate the database on the consumer side to save the records in the local message table ack_queue. Just set the status to deletedand delete the cache.

4. The login fails and keeps retrying

The cross-regional transfer business is quite special here. There may be insufficient funds in the public account, resulting in transaction failure. If there is not enough money in the public account of the transferee, the account entry will fail, but the transferee has actually deducted the money successfully, so the account entry operation must be successful, and the timed task of the transferee will keep retrying for the records in the local message table Pendinguntil The credited party's public account amount is enough to deduct, and the credited transaction is successful.

Remarks: In actual business, there is a delay in the arrival of the cross-regional transfer to the account. Although the operation procedure of the account transfer and account only needs to deduct and subtract the balance on the book, for example, from Hong Kong to Dubai, the real The money can only be transferred to the account after settlement with the local area.


4. Possible system bottlenecks

1. There may be lock grab timeouts in public accounts in various regions

At present, there is only one public account account in each region, and when transferring funds across regions, the public account balances of the sender and the receiver are operated every time. When changing the balance, a database row lock (exclusive lock) will be added to the specified account. Although the transfer business is not very frequent, if it is concurrent, it will indeed cause too many threads to wait for the database row lock to be released, and locks may occur. The scramble timed out.

2. The size of the asynchronous deduction thread pool is not enough when paying out

Since the deduction involves balance changes of multiple accounts and business table records, the response time of the deduction operation will be relatively long, so we use thread asynchrony to improve user experience. At the same time, the number of core threads, the length of the blocking queue, and the maximum number of threads in the deduction operation are not well controlled, and they still need to be adjusted according to the actual request volume. The rejection policy we set is to be executed by the main thread.

insert image description here

Guess you like

Origin blog.csdn.net/lingbomanbu_lyl/article/details/129782068