"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

The more you know, the more you don’t know

Preface

Message queues are so widely used in Internet technology storage. Almost all back-end technical interviewers have to make 360° difficulties for their friends in the use and principles of message queues.

As a face tyrant who took an offer once in the face of an Internet company, he defeated countless competitors. Every time he saw countless lonely figures leaving disappointedly, feeling a little guilty (please allow me to use exaggerated rhetoric).

So on a lonely and intolerable night, I learned the pain from the warm man and decided to start writing the "Hanging the Interviewer" series, hoping to help readers in the future, the interview will be like a broken, 360° counterattack against the interviewer, and the interviewer will be slammed. Let the colleagues who were interviewing with me were dumbfounded, and harvesting offers from big factories!

Catch it

In the last issue, I briefly introduced the basic knowledge of message queues, including the application scenarios of message queues, and the problems that may arise after using them. However, the last issue did not answer how to solve these problems, because it is necessary to control the space (clearly I don’t think MQ can write many issues, so I have to come out one more issue! Scumbag)

Keke, let’s get back to business, and friends who haven’t read it will help us to read this issue:

"Hanging the Interviewer" series-message queue basics

Interview begins

A handsome middle-aged man wearing a plaid shirt came to you with a scratched mac, looked at his bright head, and thought he must be the top architect of Nima! However, we have seen the series of Nuan Nan Ao Bing, and there are poems and books in his belly, and there is nothing in it.

"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

That's right, the young man is still me. You slipped away halfway through last time. I have to ask you a good question this time.

Good interviewer, because I was in a hurry last time, Ao Bing's series was updated, so I went home to watch it!

I believe you a ghost, let's get started. Last time we talked about the repeated consumption of messages in the message queue. Can you tell me what this is like?

Repeated consumption of messages is a problem that must be considered after using message queues. It is also a more serious and common problem. In the development process of Shuai Bing, whenever message queues are used, I first consider the problem of repeated consumption.

For example, there is such a scenario. After the user places an order successfully, I need to go to an event page to add GMV (total sales) to him, and finally give him rewards based on his GMV. This is a very common way to play in e-commerce activities.

This is similar to the gradient of the accumulated order amount to which gradient rewards you return.
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

I can only tell you that 10,000% of such event pages are added asynchronously (don't ask me why, because the backend of this event is made by Ao Bing), otherwise you want to give it to a user when you place an order Add it, it means that you have to manipulate that table. How many times do you consider the operation of this table on Double Eleven? This database or cache can't stand it.

And everyone should have the same experience. When you place an order, go to see some event pages, sometimes there will be immediately, sometimes there is a long delay, why? This speed depends on the consumption speed of the message queue. If the consumption is slow, you will see it later.

If you place an order and the payment is successful, you will send a message. The developer of the activity above will listen to your payment success message. If I listen to the news that your order is successfully paid, then I will go to my activity GMV table to give it to you Plus, everyone may think it's a logical thing to hear here.
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

But let me tell you about the use of general message queues, we all have a retry mechanism, that is, if an exception occurs in my downstream business, I will throw an exception and ask you to send it again.

There was an error in my activity, and you must be able to re-send it. But think about it carefully, where is the problem?

Yes, you are not the only one who listens to this message, there are other services that are also monitoring, and they will also fail. If he fails, he will also request a retransmission, but you are actually successful here. Retransmission is yours. Wasn't the money added twice?

right? ? ? Is it true? ? ?

Don’t understand? See below↓
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

Just like the above, our loyalty system failed to process, and his system would definitely require you to resend this message, right, the loyalty system received and processed successfully, but other people’s activities, coupons and other services are also monitored After this news, isn't it possible that the event system will add GMV twice to him and the coupon will be deducted twice?

In the real situation, it is normal to retry. Service network jitter, developer code bugs, and data problems may all fail and require retransmission.

Well, the young man analyzed it carefully, so how did you guarantee during the development process?

Generally we call this kind of processing called interface idempotence.

Idempotent (idempotent, idempotence) is a mathematical and computer science concept commonly found in abstract algebra.
The characteristic of an idempotent operation in programming is that the effect of any number of executions is the same as that of one execution.
Idempotent functions, or idempotent methods, are functions that can be executed repeatedly with the same parameters and obtain the same results. These functions will not affect the state of the system, and there is no need to worry that repeated execution will cause changes to the system.
For example, the "setTrue()" function is an idempotent function. No matter how many times it is executed, the result is the same. More complex operations are guaranteed to be idempotent by using a unique transaction number (serial number).

In layman's terms, you call my interface with the same parameters, and the result is one as many times as you call it. You add GMV to the same order number. How much is it for you to add it once, and it is still how much you add for N times.

But if you don't do idempotence, you can add multiple times when you make multiple calls to an order, and similarly, if you make multiple calls to refund, the money will be reduced multiple times.

The general processing flow is as follows:

"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

How to guarantee it?

Generally Shuai C, I answered like this:

Hello, handsome interviewer, generally idempotent, I will consider it according to the scene, depending on whether it is strong or weak. For example, the scene related to money is very important. Make strong verification. Don't be important. The scene is weakly verified.

Strong verification:

For example, if you listen to the user's successful payment message, and if you need to call the payment interface to add GMV, then another interface to add flow is called under the payment interface, and the two are placed in a transaction. Success, success, failure, and failure .

Every time a message comes, you must take the unique identifier of the order number + business scenario (compared to the Tmall Double Eleven event) to check the flow table to see if there is such a flow. If there is such a flow, just return and do not go through the following process. , Execute the following logic if not.

The reason for using the flow meter is because it involves activities such as money. If you have any problems, you can go to the flow meter to check the accounts, and also to help developers locate the problem.

Some small partners may still be a little confused, and then the small partners in the talent exchange group also said that some examples can put a little pseudo code, so I can use the code at the beginning of this issue to write some.
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

Weak check:

For this simple, unimportant scenario, such as who to send a text message, I will use this id+scene unique identifier as the Redis key, and put it in the cache. The invalidation time depends on your scenario. The message within a certain period of time will be judged by Redis. .

It doesn’t matter if you use KV even if the message is lost. Anyway, you lose an irrelevant notification text message (Dare you say that you don’t have a verification code and the text message is lost?).
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

There are many companies that use tokens for weak verification. Anyway, there are many tricks, but important scenarios must be strong verification. When the problem is actually checked, there is no persistent data on the disk, and my heart is still empty, just like you and a woman. The state of mind is the same when friends are separated. (How do I know this feeling when I'm single? Guess)

Have you ever been exposed to the scene of message sequential consumption? How do you guarantee?

No! over!

Hey, you can't say no, it's really not. You have to say yes after reading Ao Shuaibing's article!

Tip: But to be honest, it is difficult to introduce the order of consumption here. I asked a lot of seniors around me last week to this week. There are not many such scenes in the development process. I have discussed with Sanwai several times, and more on the Internet are introductions. The synchronization of binlog seems to be gone for more scenarios.

Generally, messages from several different operations in the same business scenario pass at the same time. The order itself is correct, but when you send it out at the same time, it is messed up during consumption, which is a problem.

I used to do e-commerce activities in the past. We all know that data synchronization pressure is still great when the amount of data is large. Sometimes tables with large amounts of data need to synchronize hundreds of millions of data. (It is not master-slave synchronization, there is a problem with the master-slave delay assembly, which may be from the database or the master database is synchronized to the standby database)

In this case, we all go to the queue and consume slowly. Then the problem comes. We added, modified and deleted the data of an Id in the database at the same time, but your message was sent. Changes, deletions, and additions are changed during consumption, so that the data is wrong.

A piece of data should have been deleted, but it is still there. This is not a big problem!
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

Is the result of the two completely different ↑

How do you solve it?

Let me briefly talk about a simple implementation in RocketMQ we use.

Tip: Why use RocketMQ as an example? This thing is open sourced by Ali. I asked my friends around me and many companies use it. So if the reader is likely to be this, I will use this as an example. I will see the specific details later. RocketMQ and Kafka are mentioned in their respective chapters.

Producers and consumers generally need to ensure the order of messages, it may be in a business scenario, such as order creation, payment, delivery, and receipt.

Are these things an order number? An order must be an order number. That's simple.

There are multiple queues under a topic. In order to ensure orderly delivery, RocketMQ provides the MessageQueueSelector queue selection mechanism. It has three implementations:
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

We can use the Hash modulus method to send the same order to the same queue, and then use synchronous sending. Only when the creation message of the same order is successfully sent, the payment message is sent. In this way, we ensure the order of delivery.

The queue mechanism in RocketMQ's topic can ensure that the storage meets the FIFO (First Input First Output in short), and the rest only needs to be consumed by consumers in order.

RocketMQ only guarantees sequential delivery, and sequential consumption is guaranteed by the consumer business!!!

It’s easy to understand here. When you send an order, you put it in a queue. Does the hash of the order number you agree with is still the same result? It must be a consumer consumption. Is the order guaranteed?

Different middlewares of real sequential consumption have their own different implementations. Let me give you an example here.

Tip: When I wrote this, someone in the talent group asked me that if a queue goes out in an orderly manner, it is better for one consumer to consume. What I want to say is that consumers are multi-threaded and your messages are in order. Give it to him, can you guarantee that he will handle it in an orderly manner? It's still a good thing to post one after the consumption is successful.

Can you talk to me about distributed transactions?

Distributed transactions are almost necessary in systems that are now distributed throughout.

Let's talk about what is business first?

Distributed transaction, transaction isolation level, ACID, I believe everyone is familiar with these things, so what is a transaction?

concept:

Generally refers to something to be done or done.
In computer terms, it refers to a program execution unit (unit) that accesses and may update various data items in the database.
Transactions are usually caused by the execution of user programs written in high-level database manipulation languages ​​or programming languages ​​(such as SQL, C++ or Java), and are defined by statements such as begin transaction and end transaction (or function calls).
A transaction consists of all operations performed between the beginning of the transaction and the end of the transaction.

characteristic:

The transaction is the basic unit of recovery and concurrency control.
Transactions should have 4 attributes: atomicity, consistency, isolation, and durability. These four attributes are usually called ACID characteristics.
Atomicity (atomicity): A transaction is an indivisible unit of work, the operations included in the transaction are either done or not done.
Consistency (consistency): The transaction must change the database from one consistency state to another consistency state. Consistency and atomicity are closely related.
Isolation (isolation): The execution of a transaction cannot be interfered by other transactions. That is to say, the internal operations of a transaction and the data used are isolated from other concurrent transactions, and each transaction executed concurrently cannot interfere with each other.
Durability: Durability, also known as permanence, means that once a transaction is committed, the changes to the data in the database should be permanent. The following other operations or failures should not have any effect on it.

Some students still don't understand. Ao Bing, I summarized it as: A transaction is a series of operations that either succeed or fail at the same time. Then it will start from the ACID characteristics of the transaction (atomicity, consistency, isolation, durability).

The transaction is to ensure that a series of operations can be executed normally, and it must also meet the ACID characteristics.

So what is a distributed transaction?

Everyone can think about it, your ordering process may involve more than 10 links. You have succeeded in placing an order and paying, but your coupon deduction has failed, and the point addition has failed. The former company will be deprived of wool, the latter Users will be unhappy, but how do these different services guarantee everyone's success?

Smart, distributed transactions, you see, you will answer!

Tip: The real application scenario may be several times more complicated than the scenario I introduced. I just used a very simple example just to give an example to facilitate your understanding.

The distributed transactions that I have contacted and learned about are roughly divided into:

  • 2pc (two-stage submission)
  • 3pc (three-stage submission)
  • TCC(Try、Confirm、Cancel)
  • Best effort notice
  • FAR
  • Local message table (developed by eBay)
  • Semi-message/final consistency (RocketMQ)

Here I will introduce the simplest 2pc (two-stage), and the semi-message transaction that you may use in the future, that is, eventual consistency. The purpose is to let everyone understand the role of message middleware in distributed transactions and other transactions. They are all similar and have many advantages.

Of course, there are also various drawbacks:

For example, the database resource is locked for a long time, resulting in unsatisfactory response of the system and failure to upload concurrently.

A split-brain situation occurs in network jitter, which causes the participants of the transaction to fail to execute the instructions of the coordinator well, resulting in inconsistent data.

Single point of failure: For example, when the coordinator of things is down at a certain moment, although a new leader can be generated through the election mechanism, problems will inevitably occur in the process. TCC, only a strong technical team can support development, and the cost is too high. high.

There are not many BBs, let's introduce these two things.

2pc (two-stage submission):

"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

2pc (two-stage commit) can be said to be the very beginning of distributed transactions. It is like a matchmaker, which is to coordinate multiple systems through message middleware. When two systems operate transactions, they lock resources but do not commit transactions. , When both are ready, tell the message middleware, and then submit the transaction separately.

But I don’t know if you see the problem?

Yes, you may have discovered that if the transaction of system A is successfully submitted, but the network fluctuates when system B is submitted or the submission fails for various reasons, it will actually fail.

Eventual consistency:

"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions
Throughout the process, we can guarantee:

  • The business active party fails to commit the local transaction, and the business passive party will not receive the delivery of the message.

  • As long as the local transaction of the business active party is successfully executed, the message service will deliver the message to the downstream business passive party, and ultimately ensure that the business passive party can successfully consume the message (consumption success or failure, that is, there must be a final state in the end) .

However, this is the technology. We all need to consider various extreme situations, and it is difficult to have a perfect solution. That's why there are so many distributed transaction solutions such as three-stage, TCC, and best-effort notifications. You need to know why you want to do it, what are the advantages and disadvantages of doing it, just pay attention to it during actual development. The system is designed according to business scenarios. The technology that leaves the business has no meaning, and the business that leaves the technology No confidence.

Still the same sentence: There is no perfect system, only the most suitable system.

End of interview

The boy can't tell, there is still something. These points are all answered well. Can you talk to me about RocketMQ tomorrow?

Ao Bing spent so much time on this chapter, and I am not sure whether he will finish writing it, and feel sorry for him. I really want to like him. The message backtracking is also introduced when the message middleware is introduced separately. This chapter is a bit long.

to sum up

In fact, I wrote this chapter longer than the previous spike, because I don’t know how to tell the sequence message scene, and it’s easier for everyone to understand it. In the end, I refer to the Internet. The actual application scenarios of sequence message are not as extensive. I also chatted with 3y several times, and finally decided the binlog scene.

In short, the source of creation in this issue is a bit exhausted. This chapter is really difficult to write, including distributed transactions in the actual development process, which is also a very complicated link. When it is needed, it takes a long time to design. Anyway, my flow chart is long. Get a horse.

Every time I want to write it in an easy-to-understand manner. Even if this is the case, I don’t think this article is easy to understand, but the scene of the news is like this. If you add me, don’t ask me a lot of details. , I think a little bit more for myself, I think it might help a lot better than telling you the answer, right?

Talk

Ao Bing, I have a card this week, I was on the CSDN The Force Project list, and the bonus was up to 50 yuan! ! !
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

I don’t have much money but I’m very happy. When I talked to my mom, she also thought I was going to be lucky. It happened to be her birthday. My family used to be the kind of birthdays, but this year I worked, and I took the cards. Her bonus is very important. She secretly asked her cousin to buy her cakes and gifts, hehe, happy.

DISS

"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

This is a comment from a netizen in the blog garden below my article. To be honest, I don’t know what everyone thinks, I just want to say: Haha! stupid*

I don’t know how many years of experience look like. I was actually not going to say it, because I found that many of my group are undergraduates or fresh graduates who have not yet graduated. Then I assume that my readers still have Many such students have no social experience. I am afraid that they will be misled by such people.

I remember I said in the group:
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

I can tell everyone with 80% certainty that his view is nonsense, and that 20% I agree with his view of modesty, but shouldn't modesty be our most basic attitude towards things?

But the idea of ​​playing an interview? Is there someone who is not better than you? Technical people, I believe that there are interviewers who are also reading my articles. When you are interviewing, I want to meet powerful people who are eager to recruit them and charge yourself.

And during a normal interview, you have 1-3 years of experience. Basically, you interviewed for more than 3 years, and then followed by one by one. Of course, there are also many very powerful leaders (my former owner’s leader in 95 years, ByteDance certain Leader96, which has a strong product line, etc.) When everyone has worked, you will find that there are things that you can’t learn if you don’t have time to accumulate. All you have to do is to do it step by step.

Whether those people are young or not, you must be able to sit there for interviews. You must have his reason. Then you have any talent, you can use it to your heart's content. He doesn't have the measure to tolerate your excellence. It doesn't matter if you don't go to such a company, but technical people are so true. Very few, programmers are a group of people who admire ability.

So show off everything you have in the interview and show off your talents to your heart's content. The wind is there and you just fly.

Thanks

When it comes to distributed affairs, I refer to the technical sharing of former Great God colleague: Lu Ban (Hua Ming). I am very grateful for the ideas given in his article and the analysis of the problem!
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

Every time I write, I will ask everyone in the group. Next time, you can also give me more opinions in my exchange group. Thank you.
"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

If you see it, it is very democratic. (Ao Bing, you scumbag, bah, I don’t know how to stop writing!)

Daily praise

Alright, everyone, the above is the entire content of this article. The people who can see here are all talents.

I will update a few articles on the "Hanging on the Interviewer" series and related Internet technology stacks every week. I am very grateful to the talents for seeing this. If this article is well written, I think "Ao Bing" I have something to ask. It’s really useful for me to like it, ask for attention and ask for sharing! ! !

It is not easy to create. Your support and recognition is the greatest motivation for my creation. See you in the next article!

Ao Bing | Article [Original] [Please contact me for reprinting] If there are any errors in this blog, please criticize and advise, I am very grateful!

The "Hanging the Interviewer" series is continuously updated every week. You can follow my official account "JavaFamily" for the first time to read and update (the official account is one or two earlier than the blog). This article has been included on GitHub and has a big line Mind map of the interview site of the factory, welcome to Star and perfect it, there is also my personal contact information, you can directly contact me if you have any questions, there is also a talent exchange group, we have something together.

"Hanging the Interviewer" series-repeated consumption, sequential consumption, distributed transactions

Guess you like

Origin blog.51cto.com/14689292/2546055