Some interview questions about Kafka

Last week, a guest at the interviewer, here simply to record what I asked some interview questions about Kafka period, these are my usual summary of some key points in the study of Kafka.

  • Talk about your overall understanding of kafka's?

I ask this question mainly want to know how the interviewer overall understanding of Kafka, and can generally understand clearly the interviewer familiarity with the concepts Kafka's, like news, topic, partition, replica, offset, weight balance, leader / follower, ISR and many more.

  • Why such a high throughput talk about Kafka?

Multiple partitions, batch send, kafka Reator network model, pagecache, sendfile zero copy, data compression.

  • Talk about your understanding of the mechanism of producer tanks

sender thread mechanism, the role of ByteBuffer buffer, and so on:

  • How to improve kafka throughput?

Production side adjustment batch.size, linger.ms parameters, as well as topics partition reasonable allocation.

  • Producer producer is thread safe? Examples of single-threaded or multithreaded examples the advantages and disadvantages?

  • Consumers consumer is thread safe? Examples of multi-threaded, single-threaded example, a single consumer + multi-threaded worker advantages and disadvantages?

  • When the message is pulled, under what circumstances would cause the message repeated consumption? You talk about the understanding of the displacement of the submission?

Understand the message delivery semantics:

At most once (atmostonce): messages may be lost or may be processed, but it will only be processed once;

At least once (atleastonce): messages are not lost, but may be processed multiple times;

Exact time (exactlyonce): the message is processed and will only be processed once.

If consumers shift to submit before consumption, then that is "more than once", if submitted after the displacement of consumption, then that is "at least once" if we can ensure that the consumer shift to the implementation and submission in a transaction, you can guarantee "precise time . " __consumer_offsetsSome understanding.

  • When will generate consumer groups and weight balance weight balance which will involve the relevant parameters, which will result in the consequences of frequent re-balancing?

Consumer group membership changes, the number of theme change, change subscription information; session.timeout.ms, max.poll.interval.ms, hearbeat.interval.ms;

Related articles: Kafka used to live weight balancing mechanism

  • kafka default does not support automatic partitioning redistribution, so if you make the partition to perform reallocation, which has a few steps, and there will be kafka in the reallocation process, what action?

RAR, OAR, AR, RAR-OAR, OAR-RAR-phase 关概 just in case,

Related articles: remember once Kafka online expansion , distribution source code analysis Kafka re-zoning

  • Talk about your understanding of the Preferred leader election?

After the broker hang, partition leader will change over time will become unbalanced, Kafka default minimum number of copies for the Preferred leader, after broker restart back, Kafka will re-adjust Preferred leader partitions become leader, Preferred leader election is automatic to manual elections and elections, parameters related to auto.leader.rebalance.enable, there is a default policy to allow 10% of the uneven and so on.

  • Talk about your understanding of ISR synchronous copy? ISR synchronized copy of the defective What?

Related articles: Kafka used to live ISR copy of the synchronization mechanism

  • Talk about your understanding of the watermark backup mechanism? LEO update mechanism, HW update mechanism?

Related articles: illustration: Kafka watermark backup mechanism

  • Some defects watermark backup mechanism? Loss of data, discrete data? How to solve the (leader epoch)

Related articles: illustration: Kafka watermark backup mechanism

  • Talk about your understanding of the mechanisms of the controller? What are the main controller function?

Update cluster metadata information, create a theme, delete the theme, partition redistribution, preferred leader a copy of the election, theme extended partition, broker added to the cluster, broker crash, a controlled shutdown, controller leader election.

  • Kafka log storage mechanism?

Each partition has a separate log (log Partition), the write sequence, to a certain size into the log segment file (log segment file), each corresponding to a log file index file (.index .timeindex) and the like.

  • The more the number of partitions, the more Kafka okay performance? why?

I understand:

  1. Each number corresponds to partition a log file, log file is written in order, but if there are many partitions simultaneously brush plate will disguise to write out of order, and I guess this is why RocketMQ a broker will have a CommitLog of one of the reasons for it;
  2. The client will be a threading for each partition called multithreading to process the message partition, a partition, the more the number of threads means that the more processed, to a certain extent, cause large thread switching overhead;
  3. After which a broker hang up, particularly if the partition at this time, Kafka partition leader re-elected time greatly increased;
  4. Each partition has a corresponding file handle, the partition, the more multi-system file handle;
  5. The client will be assigned a certain buffer for each partition, if the partition is too large, the greater the allocated memory.

No public more exciting articles please pay attention to the maintenance of a "back-end Advanced", which is a focus on back-end technology-related public number.

No public attention and reply "back-end" to receive free e-books related to the back-end.

Welcome to share, reproduced Please keep the source.

No public "back-end Advanced", focused on the back-end technology to share!

Guess you like

Origin juejin.im/post/5df8cb095188251262050409