E-commerce system architecture design series (6): What issues should be considered in the design of the "account system" of e-commerce?

In the last article , I left you a thought question: how to design the account system of e-commerce?

In today's article, let's talk about the account system of e-commerce.

introduction

The account system is responsible for recording and managing the balance of the user account. This balance is the money that each user temporarily stores in the e-commerce company. The source may come from various ways such as user recharge or return and refund.

The account system is also widely used, not only for e-commerce, but also for various Internet content providers, online game service providers, telecom operators, etc., all need account systems to manage the balance of user accounts, or virtual currency. Including the core system of the bank, it also includes an account system.

From the perspective of business requirements, the data model of a minimized account system can be represented by the following table:

 This table includes three fields: user ID, account balance, and update time. Every time a transaction is made, it is enough to update the balance of this account according to the user ID.

Why is it always wrong?

Each account system does not exist in isolation, at least it must be closely related to financial, order, and transaction systems. Ideally, the data within the account system should be self-consistent. The sum of the account balances of all users should be equal to the total balance of the e-commerce company's special bank account. The data in the account system should also be able to match the data in other systems. For example, the balance of each user should be able to be matched with the recharge record in the trading system and the order in the order system.

However, due to the complexity of the business and the system, the reality is that very few account systems can accurately match every account. Therefore, a slightly larger system will have a dedicated reconciliation system to check and correct data differences between the account system and other systems.

There are many reasons why the account is not registered, such as business changes, artificial modification of data, failure of data exchange between systems, and so on. As a system designer, we only need to focus on "how to avoid account reconciliation due to technical reasons". Which ones are caused by technical reasons? For example: network request errors, server downtime, system bugs, etc.

"Unable to account" is a popular saying, and its essence is the consistency of redundant data.

The redundant data here is not redundant or repeated data, but multiple copies of data containing the same information. For example, we can calculate the current account balance of the user through the data of each recharge transaction and the order data of consumption of the user. In other words, the account balance data and the transaction records related to these accounts both contain the information of "account balance", so they are mutually redundant data.

When designing the storage of the system, in principle, redundant data should not be stored. First, it is a waste of storage space. Second, it is very troublesome to keep these redundant data consistent. However, it is necessary to store redundant data in some scenarios, such as the balance of user accounts.

This data will be used very frequently during the transaction process. It is impossible to calculate the balance of the current account through all historical transaction records before each transaction. The speed of doing so is too slow, and the performance cannot meet the needs of transactions. Therefore, the account system saves the account balance of each user, which is actually a design that trades storage space for computing time.

If it is only to meet the functional requirements, the account system only records the balance, and it is enough to update the account balance every time a transaction is made. But there is a problem with this. If the account balance is tampered with, there is no way to trace it. Therefore, while recording the balance, it is also necessary to record each transaction record, that is, the flow of the account. The transactional data model needs to include at least: transactional ID, transaction amount, transaction timestamp, and information such as the systems, accounts, and transaction numbers of both parties to the transaction.

Although the running water and the balance are also mutually redundant data, recording the running water can effectively correct the problem of account balance errors caused by system bugs or human tampering, and it is also convenient for the account system to reconcile with other external systems, so the account system It is very necessary to record the running water.

When designing account flow, there are several important principles that must be followed , and it is best to limit it by technical means.

  1. The journal record can only be added, once the record is successful, modification and deletion are not allowed. Even if you need to cancel a completed transaction for legitimate reasons, you should not delete the transaction log. The correct way is to record another "cancel transaction" transaction.
  2. The serial number must be incremental, and we need to use the serial number to determine the order of transactions.

When reconciling accounts, once there is an inconsistency between the transaction flow and the balance, and it is impossible to determine where the misrecording is by business means, the general processing principle is to correct the balance data based on the transaction flow, so as to ensure subsequent Transactions can be "reconciled".

So technically, how to ensure that the flow and balance data in the account system are consistent?

Use database transactions to ensure data consistency

When designing the service interface provided externally, it cannot provide the function of updating the balance or flow alone, only the transaction function is provided. When implementing the transaction function, we need to record the flow and modify the balance at the same time, and we must try our best to ensure that under any circumstances, the two operations of recording flow and modifying the balance will either succeed or fail. There cannot be any transaction, the transaction is recorded but the balance is not updated, or the balance is updated but the transaction is not recorded.

This thing sounds simple, but in reality it is very difficult to achieve. After all, the application can only perform two operations one after the other. During the execution, various abnormal situations such as network errors and system downtime may occur. Therefore, it is difficult for the application to guarantee that both operations are successful or both are successful. fail.

The database provides a transaction mechanism to solve this problem. In fact, the transaction feature was originally designed to solve transaction problems. In English, transaction and transaction are the same word: Transaction.

Let's take a look at how to use MySQL transactions to implement a transaction. For example, to execute a recharge transaction of 100 yuan in the transaction, first record a transaction record, the serial number is 888, and then update the account balance from 100 yuan to 200 yuan. The corresponding SQL is this:

mysql> begin;  -- 开始事务
Query OK, 0 rows affected (0.00 sec)

mysql> insert into account_log ...;  -- 写入交易流水
Query OK, 1 rows affected (0.01 sec)

mysql> update account_balance ...;  -- 更新账户余额
Query OK, 1 rows affected (0.00 sec)

mysql> commit; # 提交事务
Query OK, 0 rows affected (0.01 sec)

When using a transaction, you only need to execute begin before, mark the beginning of a transaction, and then execute multiple SQL statements normally. In the transaction, you can not only execute SQL for updating data, but also query statements. Finally, execute commit, and submit the transaction. OK.

Let's take a look, what kind of guarantees can transactions provide us with?

First of all, it can guarantee that the two operations of recording the flow and updating the balance are either successful or both fail. table while the other table is not updated. This is the atomicity of transactions (Atomic) .

The transaction can also ensure that the data in the database is always converted from a consistent state (the 888 flow does not exist, and the balance is 100 yuan) to another consistent state (the 888 flow exists, and the balance is 200 yuan). For other transactions, there is no intermediate state (888 transactions exist, but the balance is 100 yuan).

For other transactions, at any moment, if there is no record of 888 in the transaction it reads, the balance it read must be 100 yuan, which is the state before the transaction. If it can read the 888 record, the balance it read must be 200 yuan, which is the state after the transaction. That is to say, transactions guarantee that the data we read (transactions and pipelines) is always consistent, which is the consistency of transactions (Consistency) .

In fact, no matter how fast the execution process of this transaction is, it will take time, and there will be a sequence in modifying the data corresponding to the flow meter and the balance table. There must be a moment when the flow has been updated, but the balance has not been updated, which means that the intermediate state of each transaction actually exists.

In order to achieve consistency, the database must ensure that during the execution of each transaction, the intermediate state is invisible to other transactions. For example, in transaction A, we wrote the record 888, but the transaction has not been committed yet, so we should not read the record 888 in other transactions. This is the isolation of the transaction (Isolation) .

Finally, as long as the transaction is successfully committed, the data will be persisted to the disk. Even if the database goes down, the result of the transaction will not be changed. This is the durability of the transaction (Durability) .

You will find that the above are the four basic characteristics of ACID transactions. What you need to pay attention to is that these four features are closely related to each other. You don't need to entangle the strict definition of each feature. What is more important is to understand the behavior of transactions, that is, when our system uses transactions, each In this case, what impact will the transaction have on your data is the key to using transactions.

Understand transaction isolation levels

With the transaction mechanism of the database, as long as we ensure that each transaction is executed in a transaction, our account system can easily ensure the consistency of the flow and balance data. However, ACID is a very strict definition, or an ideal situation. If ACID is to be fully satisfied, all transactions and SQL in a database can only be executed serially, which certainly cannot meet the requirements of general systems.

For account systems and most other transaction systems, the atomicity and durability of transactions must be guaranteed, otherwise the meaning of using transactions will be lost, and consistency and isolation can actually be sacrificed appropriately in exchange for performance. Therefore, MySQL provides four isolation levels. Take a look at this table specifically:

 Almost all articles about MySQL transaction isolation level have this table, and we can't avoid it, because this table is too classic. When many students look at this table, they are a little dizzy in the face of so many concepts, and it is really hard to understand. Let me tell you how to figure out these four isolation levels and where the key points are.

From top to bottom in this table, there are four isolation levels: RU, RC, RR, and SERIALIZABLE. The isolation of these four levels is getting stricter and the performance is getting worse. The default isolation level in MySQL is RR, repeatable read.

Let me talk about two kinds that are not commonly used. The first RU level is actually not isolated at all. The intermediate state of each ongoing transaction is visible to other transactions, so "dirty reads" may occur. In the recharge example we mentioned above, the transaction 888 was read, but the balance was still 100 yuan before the transfer. This is a dirty read. Although this level has good performance, there is a possibility of dirty reading, which is difficult for applications to handle, so it is basically not used.

The fourth "serialization" level, with perfect "isolation" and "consistency", has the worst performance and is rarely used.

Commonly used isolation levels are actually RC and RR. The default isolation level of MySQL is RR. These two isolation levels can avoid dirty reads, and can ensure that the data of uncommitted transactions will not be read in other transactions, or in layman's terms, as long as your transaction is not committed, the update made by this transaction to the data, It is invisible to other sessions, and they still read the data before your transaction update.

The only difference between RC and RR is "repeatable read". This concept is also a bit convoluted, but it is actually very simple.

During the execution of a transaction, whether it can read the data updates of other committed transactions, if it can read the data changes, it is "non-repeatable read", otherwise it is "repeatable read".

Let's give an example to illustrate, for example, we set the isolation level of the transaction to RC. Session A starts a transaction and reads the account whose ID is 0, and the current account balance is 100 yuan.

mysql> -- 会话 A
mysql> -- 确认当前设置的隔离级别是RC
mysql> SELECT @@global.transaction_isolation, @@transaction_isolation;
+--------------------------------+-------------------------+
| @@global.transaction_isolation | @@transaction_isolation |
+--------------------------------+-------------------------+
| READ-COMMITTED                 | READ-COMMITTED          |
+--------------------------------+-------------------------+
1 row in set (0.00 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select log_id, amount, timestamp  from account_log  order by log_id;
+--------+--------+---------------------+
| log_id | amount | timestamp           |
+--------+--------+---------------------+
|      3 |    100 | 2023-07-16 09:40:37 |
+--------+--------+---------------------+
1 row in set (0.00 sec)

mysql> select * from account_balance;  -- 账户余额是100元;
+---------+---------+---------------------+--------+
| user_id | balance | timestamp           | log_id |
+---------+---------+---------------------+--------+
|       0 |     100 | 2023-07-16 09:47:39 |      3 |
+---------+---------+---------------------+--------+
1 row in set (0.00 sec)

At this time, another session B completed a transfer transaction to this account and submitted the transaction. Update the account balance to 200 yuan.

mysql> -- 会话 B
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select log_id, amount, timestamp  from account_log  order by log_id;
+--------+--------+---------------------+
| log_id | amount | timestamp           |
+--------+--------+---------------------+
|      3 |    100 | 2023-07-16 09:40:37 |
+--------+--------+---------------------+
1 row in set (0.00 sec)

mysql> -- 写入流水
mysql> insert into account_log values (NULL, 100, NOW(), 1, 1001, NULL, 0, NULL, 0, 0);
Query OK, 1 row affected (0.00 sec)

mysql> -- 更新余额
mysql> update account_balance
    -> set balance = balance + 100, log_id = LAST_INSERT_ID(), timestamp = NOW()
    -> where user_id = 0 and log_id = 3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> -- 当前账户有2条流水记录
mysql> select log_id, amount, timestamp  from account_log  order by log_id;
+--------+--------+---------------------+
| log_id | amount | timestamp           |
+--------+--------+---------------------+
|      3 |    100 | 2023-07-16 09:40:37 |
|      4 |    100 | 2023-07-16 10:06:15 |
+--------+--------+---------------------+
2 rows in set (0.00 sec)

mysql> -- 当前账户余额是200元;
mysql> select * from account_balance;
+---------+---------+---------------------+--------+
| user_id | balance | timestamp           | log_id |
+---------+---------+---------------------+--------+
|       0 |     200 | 2023-07-16 10:06:16 |      4 |
+---------+---------+---------------------+--------+
1 row in set (0.00 sec)
mysql> commit;
Query OK, 0 rows affected (0.00 sec)

Note that the transaction opened before session A has not been closed at this time. Let's look at the balance of the account in session A again. What do you think it should be?

Let's look at the actual results.

mysql> -- 会话 A
mysql> -- 当前账户有2条流水记录
mysql> select log_id, amount, timestamp  from account_log  order by log_id;
+--------+--------+---------------------+
| log_id | amount | timestamp           |
+--------+--------+---------------------+
|      3 |    100 | 2023-07-16 09:40:37 |
|      4 |    100 | 2023-07-16 10:06:15 |
+--------+--------+---------------------+
2 rows in set (0.00 sec)

mysql> -- 当前账户余额是200元;
mysql> select * from account_balance;
+---------+---------+---------------------+--------+
| user_id | balance | timestamp           | log_id |
+---------+---------+---------------------+--------+
|       0 |     200 | 2023-07-16 10:06:16 |      4 |
+---------+---------+---------------------+--------+
1 row in set (0.00 sec)
mysql> commit;
Query OK, 0 rows affected (0.00 sec)

It can be seen that when we set the isolation level to RC, the account balance read by session A for the second time is 200 yuan, which is the updated data of session B. For session A, if the same piece of data is read twice in the same transaction, the result may be different, which is "non-repeatable read".

If the isolation level is set to RR, the account balance read by session A for the second time is still 100 yuan, and there is only one record in the transaction flow. Under the RR isolation level, during a transaction, for the same piece of data, the result of each read is always the same, no matter whether other sessions have updated the piece of data, this is "repeatable read".

Understanding the difference between the two isolation levels of RC and RR is enough to deal with most business scenarios.

Finally, briefly talk about "phantom reading". In actual business, phantom reading is rarely encountered, and even if it is, it will basically not affect the accuracy of the data, so you can simply understand it. Under the RR isolation level, we start a transaction, and until the end of the transaction, the data updates of other transactions in this transaction are not visible, which we just talked about.

For example, we start a transaction in session A and prepare to insert a flow record with an ID of 1000. Query the current flow, there is no record with ID 1000, and the data can be inserted safely.

mysql> -- 会话 A
mysql> select log_id from account_log where log_id = 1000;
Empty set (0.00 sec)

At this time, another session preemptively inserts the flow record with ID 1000.

mysql> -- 会话 B
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into account_log values
    -> (1000, 100, NOW(), 1, 1001, NULL, 0, NULL, 0, 0);
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

Then when session A executes the same insert statement again, it will report a primary key conflict error, but due to the isolation of the transaction, when it executes the query, it cannot find the flow with ID 1000, as if there is an "illusion" Same, this is phantom reading.

mysql> -- 会话 A
mysql> insert into account_log values
    -> (1000, 100, NOW(), 1, 1001, NULL, 0, NULL, 0, 0);
ERROR 1062 (23000): Duplicate entry '1000' for key 'account_log.PRIMARY'
mysql> select log_id from account_log where log_id = 1000;
Empty set (0.00 sec)

After understanding these isolation levels, here is finally a transaction implementation that takes into account concurrency, performance, and data consistency. This implementation is safe at isolation levels RC and RR.

  1. Add a log_id attribute to the account balance table to record the serial number of the last transaction.
  2. First start the transaction, query and record the balance of the current account and the serial number of the last transaction.
  3. Then write the water records.
  4. To update the account balance, it needs to be limited in the WHERE condition of the update statement, and it will be updated only when the serial number is equal to the serial number queried before.
  5. Then check the return value of the updated balance, and submit the transaction if the update is successful, otherwise roll back the transaction.

One thing that needs special attention is that after updating the account balance, you can not only check whether the update statement is executed successfully, but also check whether the number of rows changed in the return value is equal to 1. Because even if the serial numbers are not equal and the balance is not updated, the execution result of this update statement is still successful, only 0 records are updated.

The following is the SQL of the entire transaction for your reference:

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql>  -- 查询当前账户的余额和最后一笔交易的流水号。
mysql> select balance, log_id from account_balance where user_id = 0;
+---------+--------+
| balance | log_id |
+---------+--------+
|     100 |      3 |
+---------+--------+
1 row in set (0.00 sec)

mysql>  -- 插入流水记录。
mysql> insert into account_log values
    -> (NULL, 100, NOW(), 1, 1001, NULL, 0, NULL, 0, 0);
Query OK, 1 row affected (0.01 sec)

mysql>  -- 更新余额,注意where条件中,限定了只有流水号等于之前查询出的流水号3时才更新。
mysql> update account_balance
    -> set balance = balance + 100, log_id = LAST_INSERT_ID(), timestamp = NOW()
    -> where user_id = 0 and log_id = 3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql>  -- 这里需要检查更新结果,只有更新余额成功(Changed: 1)才提交事务,否则回滚事务。
mysql> commit;
Query OK, 0 rows affected (0.01 sec)

Finally, the following is the DDL of the two tables of flow and balance, which can be used as a reference when you execute the example yourself.


CREATE TABLE `account_log` (
  `log_id` int NOT NULL AUTO_INCREMENT COMMENT '流水号',
  `amount` int NOT NULL COMMENT '交易金额',
  `timestamp` datetime NOT NULL COMMENT '时间戳',
  `from_system` int NOT NULL COMMENT '转出系统编码',
  `from_system_transaction_number` int DEFAULT NULL COMMENT '转出系统的交易号',
  `from_account` int DEFAULT NULL COMMENT '转出账户',
  `to_system` int NOT NULL COMMENT '转入系统编码',
  `to_system_transaction_number` int DEFAULT NULL COMMENT '转入系统的交易号',
  `to_account` int DEFAULT NULL COMMENT '转入账户',
  `transaction_type` int NOT NULL COMMENT '交易类型编码',
  PRIMARY KEY (`log_id`)
);


CREATE TABLE `account_balance` (
  `user_id` int NOT NULL COMMENT '用户ID',
  `balance` int NOT NULL COMMENT '余额',
  `timestamp` datetime NOT NULL COMMENT '时间戳',
  `log_id` int NOT NULL COMMENT '最后一笔交易的流水号',
  PRIMARY KEY (`user_id`)
);

 

Summarize

The account system is used to record the balance of each user. In order to ensure the traceability of the data, it is also necessary to record the account flow. The running water record can only be added, and modification and deletion are not allowed under any circumstances. Every time a transaction is made, the running water and the balance need to be updated together in the same transaction.

Transactions have four basic characteristics: atomicity, consistency, isolation, and durability, that is, ACID, which can ensure that data updates performed in a transaction either succeed or fail. And during transaction execution, the data in the intermediate state is invisible to other transactions.

ACID is an ideal situation, especially to perfectly implement CI, which will lead to a serious decline in database performance. Therefore, the four optional isolation levels provided by MySQL sacrifice certain isolation and consistency in exchange for high performance. Among these four isolation levels, only the RC and RR isolation levels are commonly used, and the only difference between them is whether other transactions are visible to data updates in the ongoing transaction.

Thanks for reading, if you think this article has inspired you, please share it with your friends.

thinking questions

How to build a product search system?

Looking forward to, you are welcome to leave a message or contact online, discuss and exchange with me, "learn together, grow together".

previous article

E-commerce system architecture design series (5): How to design a complex and important shopping cart system?


recommended reading

Series sharing

------------------------------------------------------

------------------------------------------------------

My CSDN homepage

About me (personal domain name, more information about me)

My open source project collection Github

I look forward to learning, growing and encouraging together with everyone , O(∩_∩)O thank you

If you have any suggestions, or knowledge you want to learn, you can discuss and exchange with me

Welcome to exchange questions, you can add personal QQ 469580884,

Or, add my group number 751925591 to discuss communication issues together

Don't talk about falsehood, just be a doer

Talk is cheap,show me the code

Guess you like

Origin blog.csdn.net/hemin1003/article/details/131808826