Please talk about writing data messaging middleware, how to ensure not lost? [Notes] architecture Huperzine

Interviewer: Please talk about writing data messaging middleware, how to ensure not lost? [Huperzine architecture notes]
welcome attention to individual public number: Huperzine architecture notes (ID: shishan100)

Monday to Friday, 8:30! Boutique technical articles sent on time!

Quality learning materials acquisition channel, see end of text

Contents
1. Background introduction

2, Kafka distributed storage architecture

3, Kafka high availability architecture

4, drawing reproduce Kafka writes data loss

5. What Kafka's ISR mechanism?

6, how the data Kafka written guarantee not to lose?

7. Conclusions

(1) introducing BACKGROUND

This article, for everyone to talk about how the data is written to ensure that Kafka is not lost?

Before reading the article Interviewer: How to achieve high concurrency messaging middleware hundreds of thousands per second write? Students should know that the data is written to Kafka's landing will be written to disk.

We would not consider the specific process of writing to the disk, the first generally look at the following chart, which represent the core principles of the architecture of Kafka.

(2) Kafka distributed storage architecture

So now the question is, if they have tens of TB of data every day, do not write on a single disc machines? This is obviously not fly ah!

So, here we have to consider the distributed storage of data, in fact, distributed storage architecture and high availability middleware on news, interview before the article tier Internet giant? This question is that you have to be! Also analyzed, but here, we combine the specific circumstances of Kafka is said.

In Kafka Inside, there is a core concept called "Topic", this topic would you just assume that it is a collection of data.

For example, if you now have a web site user behavior data to be written Kafka, you can engage in a topic, called "user_access_log_topic", user behavior data are written here.

Then if you want to add or delete to change electricity supplier website orders of more records written Kafka, it can engage in a topic, called "order_tb_topic", is the order's change history table is written here.

If you say that then we give an example, say that a user behavior topic bar, which if written dozens of TB of data every day, you think that is put on a machine to fly it?

Significantly less reliable, so there is a concept called Partition Kafka, a topic is to be split into a plurality of data set partitions of data, you can think that a plurality of data pieces, each Partition may be on different machines, storage portion data.

In this way, we can not put a large collection of distributed data stored on multiple machines yet? We see the figure, take a taste.

(3) Kafka high availability architecture

But this time, we will encounter a problem, that is, in case a machine is down, the data partition management on this machine not to lose it?

So, we have to do more redundant copies of each Partition can get a copy on other machines, such a machine is down, which is just a copy of Partition loss.

If a copy of Partition How, then, Kafka election in which a copy of Parititon as Leader, and the other is a copy of Partition Follower.

Only Leader Partition is provide external read and write operations, Follower Partition is to synchronize data from the Leader Partition.

Once the Leader Partition is down, the other will Follower Partition election as the new Leader Partition provide external services to read and write, that does not achieve the high availability architecture?

Look at the following chart, look at this process.

(4) Kafka writes data loss

Now let's see, Kafka writes under what circumstances data will be lost it?

In fact, very simple, we all know that the data is written to the Leader of writing a Partition, then the Partition of Follower synchronizes data from the Leader.

But just in case a data write Leader Partition, not enough time to synchronize Follower, Leader Partiton at this time where the machine suddenly goes down it?

We look:

Above, at this time there is a data is not synchronized to the Follower Partition0 up, and then the Leader Partition0 where the machine downtime.

At this point it will Partition0 election as the new Leader of the Follower to provide services, then the user is not enrolled piece of data than just the written?

Because the Follower Partition0 are not synchronized to the latest piece of data.

This time it will cause data loss.

(5) What Kafka's ISR mechanism?

Now let's keep this issue does not say specifically how to solve, first a look back at the core mechanism of Kafka, it is the ISR mechanism.

This simple mechanism, is to give each Partition will automatically maintain a ISR list, which will have its Leader, and will also include the Follower to keep pace with the Leader.

In other words, as long as a Leader of Follower keep data synchronized with him, it will exist in the ISR list.

But if the Follower since some problems occur itself, can not lead to timely synchronize data from past Leader, Follower then this will be considered "out-of-sync", kicked out from the ISR list.

So we come to understand what the ISR is plainly, Kafka is automatically maintained and monitored to keep up with what Follower timely data synchronization Leader.

How data (6) Kafka written guarantee not to lose?

So if you want the data written Kafka is not lost, you need to ask a few:

Each Partition will at least have to have a Follower in the ISR list, to keep up with the Leader of synchronous data
every time data is written, are required to write at least Partition Leader success, as well as at least one of the ISR also wrote Follower the success, the write is considered successful
if these two conditions are not satisfied, it has been written to fail, so that the production system constantly attempt to retry until the above two conditions are met before it can be considered successfully written
in accordance with the above ideas to configure the appropriate parameters, in order to ensure data is not lost writes Kafka

it is good! Now we have to analyze several requirements above.

The first, must require at least a Follower in the ISR list.

It must be, ah, if there is no Leader Follower, or is no law in time synchronization Leader Follower data, then this thing certainly would not be able to get longer.

The second, each time data is written, the requirements of a successful leader than written, at least one of the ISR also wrote Follower success.

We see the figure below, this requirement is to ensure that each write data, must be a leader and follower have written successful, success can be written to ensure that a piece of data must have more than two copies.

This time the event leader goes down, you can switch to the follower up, then the Follower have just written data, in which case the data will not be lost.

As shown above, if the follower is now no leader, or just write leader, leader flew down, not enough time to synchronize follower.

In this case, the write will fail, and then you let the producers keep retrying until kafka meet the above conditions return to normal, in order to continue writing.

This allows data to be written kafka is not lost.

(7) summary

To sum up, in fact, kafka data loss problems, involving all aspects.

Such as the production side caching issues, including consumer side, while kafka underlying algorithms and their own internal mechanism may result in data loss.

But usually write data encountered a big problem, it is likely to result in data loss when switching leader. So this is just for this issue said about the production program to solve this problem.

End

(Cover, indicating the source network, tort deleted)

Next Fanger Wei code scanning, notes: "Data" for more "secret" quality learning materials

A large micro-wave services, distributed, high concurrency, high availability of the original series is on the way

Welcome to scan the next Fanger Wei code, sustained attention:

Huperzine architecture notes (id: shishan100)

BAT architecture more than ten years experience in purse

Recommended reading:

1, please! Interview please do not ask me the underlying principles of the Spring Cloud

2, behind double [11] carnival micro service registry how to host ten million access large systems?

3, [] to optimize the performance of thousands of concurrent channels per second at the Spring Cloud parameter optimization combat

4, micro Services Architecture how to guarantee 99.99% availability at double 11 carnival

5, Brother, tell you to use the vernacular architecture of Hadoop can understand the principle of white

6, large-scale cluster Hadoop NameNode how to carry high concurrent access to thousands of times per second

7, performance optimization [secret] Hadoop how to upload large files TB level performance optimization times

8, please, please do not ask the interview I realized the principle of TCC distributed transactions!

9, [pit father ah! The final consistency of distributed transaction] How to protect the actual production of 99.99% availability?

10, please, please do not ask the interview I realized the principles of Redis distributed lock!

11, [shines! Hadoop] see how elegant the underlying algorithms will improve the performance of large-scale clusters more than 10 times?

12, one hundred million traffic system architecture of how to support storage and computing ten billion data

How to 13, one hundred million high-flow system architecture design of fault-tolerant distributed computing system

14, how to design the system architecture of one hundred million traffic carrying ten billion high-performance architecture traffic

15, one hundred million traffic system architecture of how to design one hundred thousand queries per second, high concurrency architecture

How to 16, one hundred million full-flow design of the system architecture link 99.99% high availability architecture

17, seven clear view of the complete realization of the principle of the distributed lock ZooKeeper

18, vernacular talk volatile Java interview questions of concurrency in the end what is?

19, vernacular talk about Java Java interview questions of how to optimize CAS 8 concurrent performance?

20, vernacular talk about the issue of Java concurrency interview to talk about your understanding of the AQS?

21, vernacular talk about fairness locking and non-locking fair Java concurrency of interview questions is what?

22, read-write locks vernacular talk about micro-services registry of the Java concurrency optimization interview questions

23 Internet companies interviewer is 360 ° no dead investigate how the candidates? (Part I)

24 Internet companies interviewer is 360 ° examine how the candidates no dead ends? (Part II)

25, Java interview one of the Advanced Series: Dude, your system architecture Why introduce messaging middleware?

26, Advanced Java [two] Interview Series: Dude, you talk about the introduction of messaging middleware architecture What are the disadvantages?

27, [walking] denoted a friend Offer harvester gains BAT Experts Offer technical interview experience

28, [Advanced Java Interview Series Three] Dude, messaging middleware in your project is in how landing?

29, [Advanced Java interview series four] cut to the heart! When online service is down, how to ensure 100% data is not lost?

30, a JVM behind FullGC, actually hidden thrilling online accidents!

31, high concurrency optimization practice [10] times the pressure of the incoming request, your system will be defeated it?

32, [Advanced Java interview series Five] messaging middleware cluster crashes, how to ensure that one million production data is not lost?

How to 33, one hundred million of the flow system architecture design scalable architecture in the tens of thousands of concurrent scenes (on)?

34, one hundred million traffic system architecture of how to design a scalable architecture in the tens of thousands of concurrent scenes (in)?

35, one hundred million flow of how to design the system architecture scalable architecture (under) in the tens of thousands of concurrent scene?

36, one hundred million flow architecture of the second bomb: Your system really perfect it?

37, one hundred million traffic system architecture of how to ensure data consistency at tens of billions of traffic (on)

38, one hundred million traffic system architecture of how to ensure data consistency (in) at tens of billions of traffic?

39, one hundred million traffic system architecture of how to ensure data consistency (lower) at tens of billions of traffic?

40, the Internet kill interview: how to ensure the full link data messaging middleware 100% is not lost (1)

41, the Internet kill interview: how to ensure the message middleware 100% full link data is not lost (2)

42, the big kill Interview: messaging middleware how to optimize the consumption of times the throughput?

43, under high concurrency scenarios, how to ensure that producers deliver a message to the messaging middleware is not lost?

44, brother, with a large white vernacular tell you can understand the fault-tolerant distributed system architecture

45, from the core design team from one million concurrent middleware systems research to see Java concurrent performance optimization

46, [non-advertising, pure dry goods] English poor programmers how to read the official document accessibility?

47, if 200,000 users simultaneously access a hotspot cache, how to optimize your cache architecture?

48, [non-advertising, pure dry] small and medium companies of Java engineers how to counter-attack rushed BAT?

49, please, please do not ask interview architectural principle I distributed search engine!

50, [gold and three silver four season] quit Java engineer how to do the interview ready in a month?

51, [I] offer harvesters necessary Java projects on their resumes are good low, how do?

52, [offer] Where's my ten-day interview Java Kong, all down the drain!

53, the high-order Java development required: unique id generation algorithm distributed system you know?

54, support high concurrent systems Nikkatsu millions of users, how to design its database schema?

55, embarrassing! Micro Spring Cloud Service Eureka 2.x registry maintained supposed to stop?

56, [Java] How to optimize high-order necessary micro Spring Cloud service registry architecture?

57, Interviewer: How to achieve high concurrency messaging middleware hundreds of thousands of writes per second?

58, [non-advertising, pure dry goods] forty years older programmers, how to maintain their competitiveness in the workplace?

Author: stone architecture cedar notes
link: https: //juejin.im/post/5c6a9f25518825787e69e70a
Source: Nuggets
copyright reserved by the authors. Commercial reprint please contact the author authorized, non-commercial reprint please indicate the source.

Published 100 original articles · won praise 12 · views 10000 +

Guess you like

Origin blog.csdn.net/hmh13548571896/article/details/104106643