1kafka of iSR, on behalf of what AR

kafka need all iSR all the synchronization is complete, it indicates a successful synchronization

AR: All copy partitions

1, AR

AR maintains a list of Kafka, including a copy of all partitions. AR is divided into ISR and OSR.

ON = ISR + OSR.

AR, ISR, OSR, LEO, HW this information is stored in Zookeeper in.

1．ISR

ISR data to be synchronized in a copy of the leader of the only complete data synchronization are only considered to be successfully submitted, after successfully submitted for the outside world to visit.

In this synchronization process, the data can not be written to even have access to the outside world, this process is by LEO-HW mechanism to achieve.

2. OSR

Whether a copy of data synchronization within the OSR leader, does not affect the data submitted, follower in the OSR try to synchronize leader, version data may be left behind.

Initially all the copies are in the ISR, in the process kafka work, if a copy of sync speed slower than replica.lag.time.max.ms specified threshold, were kicked out of the ISR into OSR, if the follow-up speed recovery can return to the ISR.

3．LEO

LogEndOffset: The latest offset data partition, when data is written leader, LEO on the implementation of the latest data immediately. Equivalent to the latest data flag.

4. HW

HighWatermark: Only the written data is synchronized to all copies of the ISR, the data is only considered to have been submitted, HW update to that location, HW data before it can be accessed by consumers, to ensure that no synchronization is complete the data will not be consumers have access to. The equivalent of all copies of data synchronization flag.

After the leader down, only to pick a new leader from the ISR list, no matter which copy the ISR was chosen as the new leader, it knows that data before HW, can guarantee that after switching the leader, consumers can continue to watch prior to the HW already submitted data.

Therefore, the latest data has been written on behalf of LEO position, and HW said that it has completed the synchronization of data, only the data before HW access to the outside world.

5. HW cut-off mechanism

If the leader is down, elected a new leader, and the new leader does not guarantee that all the data has been fully synchronized before the leader, only to ensure that the data is synchronized over the previous HW, in which case all of the data to be follower HW truncated to the position, and then the new leader synchronization data to ensure data consistency.

When the recovery downtime leader, found a new leader in data inconsistencies and data it holds, then cut down the leader will hw own data to the position before the downtime, and then synchronize the data for the new leader. Downtime leader alive, like follower, like data synchronization to ensure data consistency.

kafka storage mechanism

kafka be divided by topic theme store data, there are partitions within the theme, the partition can have multiple copies of internal partitioning further subdivided into several segment.

The so-called partition file is actually created in kafka the corresponding storage directory folder, the folder name is the subject name plus the partition number, numbered from zero.

1、segment

File so-called segment is actually produced under the partition corresponding folder.

A partition segment is divided into a number of equal size, so that on the one hand to ensure that the data partition is divided into multiple files to ensure no oversized documents; on the other hand may be based on historical data of these files are deleted segment ,Improve efficiency.

And a segment of a .log and a .index files.

1．.log

.log file as a data file used to store the data segment data.

2．.index

.index hold index information for the corresponding .log file is an index file.

In .index file, save the index information corresponding to the file .log, each offset may be learned is stored in the current segment start position .log file by looking .index file, each log has a fixed format , including relevant information stored offset number, log length, key length, etc., the data in the fixed format may determine the current position of the end of the offset, thereby reading the data.

3. Naming Rules

Naming these two files are:

a first global partition segment from zero, for each subsequent segment file named file on a segment offset value of the last message, 64-bit numerical values, 20 numeric characters in length, with no digital zeros.

2, the read data

At the start of reading the specified partition data corresponding to an offset, according to the first segment of the names of all offset and compares the current partition, determine in which segment data, and then find the segment of the index file, determine the current offset in the data file in the starting position, and finally to start reading the data file from the location, the judgment result in the data format for a complete data.

2, producers reliability level

Through the above explanation, it has been possible to ensure the reliability of internal kafka cluster, but when sending the producer to kafka cluster, data transmission through the network, is not reliable, may result in loss of data because of network latency, flash and so on.

kafka offers three levels of reliability as to producers through different strategies to ensure the reliability of different safeguards.

In fact, this policy configuration is successful leader will receive the message information in response to the client's time.

By request.required.acks parameters:

1: Producers send data to the leader, the leader successfully transmitted information is received data, the producer received after sending data considered successful, if success has not received the message, the producer failed to send data automatically considered to retransmit the data.

When the leader goes down, you may lose data.

0: Manufacturer stop sending data to the leader, leader feedback without success message.

This mode is the highest efficiency, lowest reliability. Data may be lost during transmission, you may lose data when the leader is down.

-1: producer sends data to the leader, the leader receives the data until all copies of the ISR list are synchronized data is complete, before successfully sending a message to the producer, if not receive a success message, then that sends data failure will automatically retransmit the data.

In this mode very high reliability, but when the ISR list only the leader, when the leader let down though there might lose data.

Can configure the ISR observed min.insync.replicas specified requirements must be at least the number of copies specified, the default value is 1, it is necessary to a value of 2 or greater

So that when producers send data to the leader but found only ISR leader himself, receive an exception indicates that the data write fails, then the data can not be written to ensure that the data is definitely not lost.

Although it is not lost but may produce redundant data, such as producers send data to the leader, leader synchronous data to the ISR follower, leader synchronized to the half down, this time to elect a new leader, may have part of this submission data, while the producers failure message is received retransmission data, the new data is the data receiving leader repeated.

3, Leader election

When a follower ISR will choose the time down to become the new leader of the leader, if all copies of the ISR are down, how do?

It has the following configuration can solve this problem:

unclean.leader.election.enable=false

Strategy 1: ISR must wait for a copy of the list alive before choosing to become leader of its continued work.

unclean.leader.election.enable=true

Strategy 2: Select any one of the copies survived, becoming the leader to continue to work, this may not be the ISR follower.

Strategy 1, the reliability is guaranteed, but low availability, only the last hanging leader alive kafka to recover.

Strategy 2, high availability, reliability, there is no assurance that any copy alive can continue to work, but there may be data inconsistencies.

4, to ensure the reliability of kafka

At most once: message may be lost, but will not repeat transmission.

At least once: no messages will be lost, but may be repeated transmission.

Exactly once: Each message will certainly be transmitted once and only once transmission.

kafka up to guarantee At least once, can guarantee not lost, but may be repeated in order to resolve duplicate the need to introduce mechanisms to uniquely identify and de-emphasis, kafka provides a unique identification GUID achieved, but does not provide built-in deduplication mechanism needs developers based on business rules to their own weight.

What is 1.Kafka ISR, AR, they also represent?

ISR: keep the follower collection synchronized with leader

AR: All copy partitions

The 2.Kafka HW, LEO represent so what?

LEO: No copy of the final message of the offset

HW: a partition minimal offset all copies

3.Kafka is reflected in the order of how the news?

Each partition, and each message has an offset, it can only ensure the orderly partition.

4.Kafka in the district, a serializer, interceptor whether understand? What is the processing order between them?

Blocker -> serializers -> partitioner

5.Kafka the overall structure of the producer client look like? Use several threads to handle? What are they?

6. "The number of consumer spending in the group if the topic of partition over, then there will be less consumer spending data," this sentence is correct?

correct

7. Consumers submitted when submitting the consumption displacement or offset current consumption is offset to the latest news of + 1?

offset+1

8. What circumstances would cause duplication of spending?

9. Those scenes will cause the leak news consumption?

To submit offset, post-consumer, may cause duplication of data

10. When you use kafka-topics.sh create (deletion) of a topic, what is the logic behind Kafka would perform?

1) creates a new topic in the next node in the zookeeper / brokers / topics node, such as: / brokers / topics / first

2) Trigger Controller listeners

3) kafka Controller is responsible for creating work topic and update the metadata cache

Can you increase the number of partitions 11.topic? If you can increase how? If not, that is why?

can increase

bin/kafka-topics.sh --zookeeper localhost:2181/kafka --alter --topic topic-config --partitions 3

You can reduce the number of partitions 12.topic? If you can reduce how? If not, that is why?

Not decrease, partition deleted data difficult to handle.

13.Kafka have topic inside it? What if that? What's the use?

__consumer_offsets, save consumers offset

14.Kafka partition allocation concept?

A plurality of partitions topic, a plurality of consumers consumer, so the consumer needs to be assigned a partition (roundrobin, range)

15. The log directory structure outlined Kafka's?

Each partition corresponds to a folder, the folder name as topic-0, topic-1, and internally .log file .index

16. If I specify an offset, Kafka Controller how to find the corresponding message?

17. The role of chat Kafka Controller?

The broker is responsible for managing the cluster offline, copies of all partitions assigned topic and leader election and so on.

18.Kafka in those places need elections? These local elections and what strategy?

partition leader (ISR), controller (first come first served)

19. What is meant by a copy of the failure? What are the responses?

It can not be synchronized in time with the leader, being kicked out of the ISR, such as its leader after the catch and then rejoin

Those design 20.Kafka let it have such a high performance?

Partition, sequential write disk, 0-copy

Server.properties all configuration parameters (for explanation) the list below:

parameter	Description (explain)
broker.id =0	Each broker uniquely represented in the cluster, the requirement is a positive number. When the IP address of the server is changed, broker.id does not change, the situation will not affect consumers of news
log.dirs=/data/kafka-logs	Kafka storage address data, a plurality of address words separated by commas / data / kafka-logs-1, / data / kafka-logs-2
port =9092	broker server service port
message.max.bytes =6525000	It represents the maximum size of the message body, in bytes
num.network.threads =4	The maximum number of threads broker processing a message, generally do not need to modify
num.io.threads =8	The number of threads broker disk IO processing, value should be greater than the number of your hard drive
background.threads =4	Some background task processing threads, for example, delete outdated message file, under normal circumstances do not need to modify
queued.max.requests =500	IO Threading waiting queue maximum number of requests, if the request is waiting for IO exceeds this value, it will stop accepting external message, it should be a self-protection mechanism.
host.name	broker host address, if set up, it will bind to this address, if not, it will bind to all interfaces, and send one of them to ZK, generally do not set
socket.send.buffer.bytes=100*1024	The socket transmit buffer, socket tuning parameters SO_SNDBUFF
socket.receive.buffer.bytes =100*1024	the socket receive buffer, socket tuning parameters SO_RCVBUFF
socket.request.max.bytes =10010241024	socket requested maximum value, when the cover to prevent the specified parameters serverOOM, message.max.bytes bound to less than socket.request.max.bytes, will be created topic
log.segment.bytes =102410241024	topic partition based on a bunch of files stored in the segment, the control of the size of each segment, covering the specified parameters will be created when the topic
log.roll.hours =24*7	This parameter does not reach the size of log.segment.bytes set in the log segment, covering the time specified parameters will be forced to create a new segment will be created topic
log.cleanup.policy = delete	Log cleanup strategy choose: delete expired and compact mainly for data processing, or log file reaches the limit on the amount of specified parameters will be created when the topic covered
log.retention.minutes=3days	The maximum data storage time exceeds this time will be set according to the policy process data log.cleanup.policy, that is, how long can the consumer end to consumption data log.retention.bytes and log.retention.minutes any one to meet the requirements, will perform the deletion, specify the parameters will be created when the topic covered
log.retention.bytes=-1	The maximum file size for each partition of the topic, a topic size limit = number of partitions * log.retention.bytes. -1 no size limit log.retention.bytes and log.retention.minutes any one to meet the requirements, will perform the removal, will be covered by the parameters specified when the topic is created
log.retention.check.interval.ms=5minutes	Check the file size of the cycle time, whether the punishment set in the policy log.cleanup.policy
log.cleaner.enable=false	Log compression is turned on
log.cleaner.threads = 2	The number of threads running log compression
log.cleaner.io.max.bytes.per.second=None	Log Compression dealing with the maximum size
log.cleaner.dedupe.buffer.size=50010241024	Log Compression deduplication cache space when, in the case of space permits, the bigger the better
log.cleaner.io.buffer.size=512*1024	General IO block sizes when used without modifying the log cleanup
log.cleaner.io.buffer.load.factor =0.9	Log in to clean up the hash table expansion factors generally do not need to modify
log.cleaner.backoff.ms =15000	Check whether the punishment log cleaning intervals
log.cleaner.min.cleanable.ratio=0.5	Log cleanup frequency control, the bigger means more efficient clean-up, while there will be some waste of space, covering the specified parameters will be created when the topic
log.cleaner.delete.retention.ms =1day	The maximum time reserved for compressed log, the message is consumed by the client the maximum time difference with log.retention.minutes that a control uncompressed data, a data compression control. Will be covered by the specified parameters to create the topic
log.index.size.max.bytes =1010241024	For the segment of the index log file size limit, specify the parameters will be overwritten when the topic is created
log.index.interval.bytes =4096	When performing a fetch operation, a certain space to the nearest offset scan size, the larger set, represents the faster the scan rate, but also a better memory, generally you do not need this parameter Dali
log.flush.interval.messages=None	log文件”sync”到磁盘之前累积的消息条数,因为磁盘IO操作是一个慢操作,但又是一个”数据可靠性"的必要手段,所以此参数的设置,需要在"数据可靠性"与"性能"之间做必要的权衡.如果此值过大,将会导致每次"fsync"的时间较长(IO阻塞),如果此值过小,将会导致"fsync"的次数较多,这也意味着整体的client请求有一定的延迟.物理server故障,将会导致没有fsync的消息丢失.
log.flush.scheduler.interval.ms =3000	检查是否需要固化到硬盘的时间间隔
log.flush.interval.ms = None	仅仅通过interval来控制消息的磁盘写入时机,是不足的.此参数用于控制"fsync"的时间间隔,如果消息量始终没有达到阀值,但是离上一次磁盘同步的时间间隔达到阀值,也将触发.
log.delete.delay.ms =60000	文件在索引中清除后保留的时间一般不需要去修改
log.flush.offset.checkpoint.interval.ms =60000	控制上次固化硬盘的时间点，以便于数据恢复一般不需要去修改
auto.create.topics.enable =true	是否允许自动创建topic，若是false，就需要通过命令创建topic
default.replication.factor =1	是否允许自动创建topic，若是false，就需要通过命令创建topic
num.partitions =1	每个topic的分区个数，若是在topic创建时候没有指定的话会被topic创建时的指定参数覆盖

以下是kafka中Leader,replicas配置参数
controller.socket.timeout.ms =30000	partition leader与replicas之间通讯时,socket的超时时间
controller.message.queue.size=10	partition leader与replicas数据同步时,消息的队列尺寸
replica.lag.time.max.ms =10000	replicas响应partition leader的最长等待时间，若是超过这个时间，就将replicas列入ISR(in-sync replicas)，并认为它是死的，不会再加入管理中
replica.lag.max.messages =4000	如果follower落后与leader太多,将会认为此follower[或者说partition relicas]已经失效 ##通常,在follower与leader通讯时,因为网络延迟或者链接断开,总会导致replicas中消息同步滞后 ##如果消息之后太多,leader将认为此follower网络延迟较大或者消息吞吐能力有限,将会把此replicas迁移 ##到其他follower中. ##在broker数量较少,或者网络不足的环境中,建议提高此值.
replica.socket.timeout.ms=30*1000	follower与leader之间的socket超时时间
replica.socket.receive.buffer.bytes=64*1024	leader复制时候的socket缓存大小
replica.fetch.max.bytes =1024*1024	replicas每次获取数据的最大大小
replica.fetch.wait.max.ms =500	replicas同leader之间通信的最大等待时间，失败了会重试
replica.fetch.min.bytes =1	fetch的最小数据尺寸,如果leader中尚未同步的数据不足此值,将会阻塞,直到满足条件
num.replica.fetchers=1	leader进行复制的线程数，增大这个数值会增加follower的IO
replica.high.watermark.checkpoint.interval.ms =5000	每个replica检查是否将最高水位进行固化的频率
controlled.shutdown.enable =false	是否允许控制器关闭broker ,若是设置为true,会关闭所有在这个broker上的leader，并转移到其他broker
controlled.shutdown.max.retries =3	控制器关闭的尝试次数
controlled.shutdown.retry.backoff.ms =5000	每次关闭尝试的时间间隔
leader.imbalance.per.broker.percentage =10	leader的不平衡比例，若是超过这个数值，会对分区进行重新的平衡
leader.imbalance.check.interval.seconds =300	检查leader是否不平衡的时间间隔
offset.metadata.max.bytes	客户端保留offset信息的最大空间大小
kafka中zookeeper参数配置
zookeeper.connect = localhost:2181	zookeeper集群的地址，可以是多个，多个之间用逗号分割hostname1:port1,hostname2:port2,hostname3:port3
zookeeper.session.timeout.ms=6000	ZooKeeper的最大超时时间，就是心跳的间隔，若是没有反映，那么认为已经死了，不易过大
zookeeper.connection.timeout.ms =6000	ZooKeeper的连接超时时间
zookeeper.sync.time.ms =2000	ZooKeeper集群中leader和follo

半_调_子

发布了131 篇原创文章 · 获赞 27 · 访问量 32万+

私信关注

Big Data: Kafka Key Concepts