CRUD Architect eyes: you really write status updates? update learned!

statement

The following story, technical points recorded actually happened to me. In order to record these knowledge points, while allowing everyone to a relaxed state of mind to be read, it will be adapted into a humorous story.

Who's debut:
H brother: Our project director, technical director, Big Brother's image, the referee is the final general problem
in C major: our architects classmate, wide technology covered surface, to consider the issue fully, trick also very multi-
small L: newly recruited graduates development company, has a tremendous passion for technology - need is time to hone and experience!
Big V: Senior JAVA development, with more than 3 years development experience, is working in the direction toward the architect!
Fox Sang: We test the students, has a wealth of features and performance testing experience, can always find a lot of bugs in the testing process oh ~

Special Note

The story, the material still comes from my work experience. The story is a real place in our case within the team, for students just entering the field of mobile payments, it would be a very good inspiration, let us encourage each other.

background knowledge

To follow this story, first of all we have to start with the general flow of mobile payments. Here because it involves some of the assets of the company, there are some secret content, so I will make the entire mobile payment system is a simplified model. Therefore, in the actual production process, today the story mentioned data model and process, only for them to learn research purposes, production is not enough, remember Oh!

In general, our mobile payment there will be several objects, orders, merchandise, pay for water. Their relationship, in general, is this:
Here Insert Picture Description
an order, there will be more information for products which pass, in other words, there may be one or more commodities, merged into one order for payment.
Pay water, may also be one or more, why do you say? Putting aside there is no possibility of an order and sub-micro-channel pay Alipay split two ways. For example, we want to design a coupon system, then deduct the amount of the coupon, it should generate a payment made by the flowing water. Otherwise, the day-end reconciliation of orders and the total amount of water I'm sorry to total will appear. At the same time, to show users the information in the order, do not show the coupon to offset part of the amount, it is actually justified. Thus, the general in the design process, there will be more than an order to pay water.
Of course, this story has nothing to do with this model is basically just as a prerequisite background, I would let you do a brief introduction.
To tell us in payment, generally speaking, the current mainstream payment channel of payment, are asynchronous. For example, we look at the Alipay payment API:
Here Insert Picture Description
Obviously, we have to create an order and pay for water should be completed in the first step, then eventually complete the payment in Alipay, should only receive Alipay in step 7 asynchronous callback notification .
There is no doubt that the entire procedure is asynchronous. So our timing diagram depicting him out, it should look like this:
Here Insert Picture Description
Again: The above data model and process, I simplified a lot, and a lot of exception handling process, only for study and research purposes , production is not enough, remember Oh! If you really want to understand this one application, can private letter I, we can communicate privately.

Well, if you read the above these processes and models. Then you can begin to see our story today a.

Development tasks come

That day, my father took the product macbook air laptop computer, valiant came over: "H brother, you give a row of chant ~ app store to see previous discussion we pay, Shashi Hou arrival ah?."
"Then now?" H brother very straightforward, because the demand has dragged for two weeks, and do not give an explanation, it is estimated to be suspended or beaten product father.
Demand very short answer, it is to be a simple app store payment system, because before the mall redeem the goods are now beginning to increase functionality with money to buy.
According to the general process we have just said, in fact, as long as the background and development of the single payment under the two-part feature on the list.
Here Insert Picture Description
Then, after a fierce demand for review. H brother directly call the shots. Because docking with pay channels, needs some development experience, less time on direct novice, so this part of the work on the arrangements for the big V.
For ordering system Well, relatively short-answer logic, is the next single, and then maintain order status, on to our little friends L students.

Is not water status update Well, look at me to get

After a small L needs to get a little thought for the entire business process, but also carefully drew a flow chart, as follows:
Process briefly
business process is very simple thing, small L did not think, began to engage in Kaka, when a developer to pay when the interface, found a small L, V-payment interfaces, in fact, is an asynchronous interface. So small L code to process payments result of the original written, into the consumer code to receive callback pay big V in the MQ.

In general, an asynchronous processing mechanism, divided submit the request, the callback process and active queries of three parts. This story, we focus on a callback request submission and processing of active inquiry, we all just be aware of their own time to achieve not only depend on the callback mechanism downstream systems, they should also have their own active query mechanism.

A few days later, the order system to get. When the small L finish their test cases, large V there, also engaged over. Two people were immediately FBI, the result is very smooth, no two days to put all they can think of measuring points are finished. So they did the last time the code submission, after marked tag, let Fox do a whole Sang tested.

Functional testing and acceptance by

fox is good for first by Sang They deployed to document the system in a test environment to build a little bit. Then the shining product demand, as well as pre-order a good test case, the whole run again, and found no problems on function.
Orders, payments, orders successfully, function ok.
Orders, cancel payment order is closed, the function ok
orders, insufficient funds, closed orders, functions ok
Obviously, fox mulberry for the results of this test are quite satisfactory. Then, they begin preparing to enter the stress test.

Pressure measurement start, then. . .

When the pressure measured beginning, fox mulberry, put paid to his flapper V-prepared (baffle will return successful payment).

Here explain what is called the bezel.
In the pressure measured when we do, the data must be generated at random. On the above example, the pressure to do when measured, is clearly not possible to directly initiate the whereabouts of payment channels to pay. In other words, we want in the middle of the payment system and payment channels, add a baffle to pay, the payment system used to simulate the return payment system (payment system that payment was successful, in fact, did not go to initiate a payment) to simulate the entire chain road. Link looks like this in the following way:Here Insert Picture Description

Pressure test results can also be relatively speaking, there are 400TPS, body mass for the current system for tens of thousands of orders every day, fox mulberry that has been perfectly adequate.
While fox Sang finished pressure test report, ready to call it a day when the clear data on shore, a few odd-looking data, causing a fox mulberry attention, slowly, fox Sang frowned and found that things do not seem to Correct.

Local can not reproduce, it has also not handle the big V

fox Sang found out what the problem? It turned out that the database record orders, many are to be paid! It does not make sense ah, baffles return is successful payment, how orders not paid could happen?
Called on Sangma fox big V, let the big V to look for reasons which may arise. After the Big V looked at both sides of the data, payment data found in the water table, in fact, is correct, that is, payment systems processing logic here is OK, the callback to the order of the message should also pay for success. But in fact the data state in the Orders table, became the update was to be paid.
Big V immediately called the small L, look at the problem with this one. They carefully looked small L written orders callback processing logic, and the log run.
Here Insert Picture Description
A ghost, payStatus = 2, the payment is successful, the results of the last update is the return of 1, indicating that data is successfully updated ah. But why last seen in the database payStatus = 1! !
This completely refresh the small L Three Views! ! Write for so long update, suddenly, the feeling is so strange! When a programmer really so hard ah! Even update will not write, after this long road can go how ah!
Small and large L V and later found a few hours, but also take the local analog data, but also in the development environment simulation, but has never been able to reproduce this problem.
I really can not, so help us to see called the C major, C major, perhaps to have a way to do it, why did he trick much?
C Major looked log a program, and then looked SQL update data for small L:

update t_trade_record set payStatus = 2 where pay_id = 'xxx' and update_time = 'xxx'

Here explain why when updating data, where small L added update_time conditions, mainly to prevent multiple services at the same time updating data, can be detected. If other service updates, then update_time will increase, update it returns 0, the program can make the corresponding treatment.

Finally I went to see the data in the database of a problem, and then heart seems to have the answer, but it does not fully determined.
Shuaixia: "I probably know where the problems lie, and I went to operation and maintenance to be something to prove my idea!."

The original binlog can play so

C major operation and maintenance to find what you want to go out? After half an hour, C big back, holding a file mysql-bin.000001.
The original C major, went to the operation and maintenance take binlog mysql database to go, the purpose is to go to find during the pressure test, the data update records.
C binlog general to the local copy, and then the local database mysql installed in mysqlbinlog assembly. Skillfully Qiaoxia this command:

mysqlbinlog --base64-output=decode-rows -v -d xxx --start-datetime='2020-03-10 14:33:06' --stop-datetime='2018-03-20 14:34:07' mysql-bin.000001  > 1.sql

Then, you get a file of 1.sql, which records identified as xxx database update records all data 2020-03-10 14:33:06 to 2018-03-20 14:34:07 from.
Here Insert Picture Description
According to the order number provided by small L, C big soon found during this period, all operations of this data.
PayStatus change path data as follows: insert (0) -> update (2) -> update (1)

Big C said: "You see, this is your success prompted to update the log, but the final result is indeed the reason 1 data is updated became 2 in the middle, but finally became 1. updated!"

It turned out that due to the addition of baffles pay increase, coupled with the intensity of the pressure measurement data, making the CPU pressure increases, submit the updated original payment can be completed in the course of a few milliseconds, the update is pulled paid only after the callback carried out!

Here Insert Picture Description
However, the small L is not the time to judge update_time do it? Why is there a problem?

The reason is that, update_time only accurate to one second, if the update timing problems occur in less than 1 second, then such an approach, you can not avoid this problem.
Like when we use CAS multithreaded updates, you can not avoid the problem of ABA.

Know the problem in the future, small L for their SQL has been modified, this problem is solved. Only when the created order record is only updated to be paid, other state Description Status change has occurred, the process is not performed.

update t_trade_record set payStatus = 1 where pay_id = 'xxx' and update_time = 'xxx' and pay_status = 0

How to write a status update process, in fact, there are routines

There are many ways to help us better manage change data state. Prevent some state data appears in extreme cases confusing problem.

  1. You only have to ensure that the final data consistency.

In a distributed system, there is a well-known principle of CAP, we tend to choose high-availability and fault-tolerant partitions to increase throughput and availability of the system, but need to sacrifice strong consistency of system data, replaced eventually use the data to ensure consistency the final result of the system correctly.

  1. Find out the business logic, and data for the need to change the state of the flow diagram drawn about each state.
    Here Insert Picture Description

In the figure, we can clearly see the status of the business data model state under various conditions of flow. Here is a routine: "There are state and the end of the arrow is directly associated with it, which we call final status, once the data into the final state, should no longer be changed." This can be a good guide us to write this business data update, the data that is already in the final state, we write SQL update, it is possible to write the form stat <> 1 and stat <> 2 ..., to prevent the final state of the incoming data, because the timing issues, and It became the intermediate state is updated, thus ensuring the final data consistency!

  1. Complex business logic, the state of a lot of the time, when we write code, you can consider using the state machine mode.

to sum up

The above example, a good explanation of why the system is running a business for a long time, no problems occur, rarely release version, but the online environment and then one day suddenly appeared a lot of problems.
Many, in fact, are hidden under high concurrency, low load under normal circumstances, it is difficult to reproducible. So, sometimes we do performance testing, not just because of how high concurrency requirements of business scenarios. Rather, it helps us to assess the capacity of the system, configuration parameters tuning system, and a program bug found in some low-load conditions can not be found.
Today's story on here, I hope we can be harvested.

Released four original articles · won praise 1 · views 147

Guess you like

Origin blog.csdn.net/m0_37911064/article/details/104910754