Kafka知识点(Partitions and Segments)

Kafka Producer配置

acks  String,默认是acks=1

生产者需要leader确认请求完成之前接收的应答数。

acks=0 如果设置为0,那么生产者将不等待任何消息确认。消息将立刻添加到socket缓冲区并考虑发送。在这种情况下不能保障消息被服务器接收到。并且重试机制不会生效(因为客户端不知道故障了没有)。每个消息返回的offset始终设置为-1。
acks=1,这意味着leader写入消息到本地日志就立即响应,而不等待所有follower应答。在这种情况下,如果响应消息之后但follower还未复制之前leader立即故障,那么消息将会丢失。
acks=all 这意味着leader将等待所有副本同步后应答消息。此配置保障消息不会丢失(只要至少有一个同步的副本或者)。这是最强壮的可用性保障。等价于acks=-1。

 

retries  int,默认retries=1

设置一个比零大的值,客户端如果发送失败则会重新发送。注意,这个重试功能和客户端在接到错误之后重新发送没什么不同。如果max.in.flight.requests.per.connection没有设置为1,有可能改变消息发送的顺序,因为如果2个批次发送到一个分区中,并第一个失败了并重试,但是第二个成功了,那么第二个批次将超过第一个。

max.in.flight.requests.per.connection  int , 默认=5

阻塞之前,客户端单个连接上发送的未应答请求的最大数量。注意,如果此设置设置大于1且发送失败,则会由于重试(如果启用了重试)会导致消息重新排序的风险。

max.request.size  int , 默认=1048576 Byte=1 M

请求的最大大小(以字节为单位)。此设置将限制生产者的单个请求中发送的消息批次数,以避免发送过大的请求。这也是最大消息批量大小的上限。请注意,服务器拥有自己的批量大小,可能与此不同。

Kafka Record

•Every message publish to Kafka called “Record

Record contain two parts:

Key

•Used by compaction or for message grouping

•If a key is sent, then the producer has the guarantee that all messages for that key will always go to the same partition

•This enables to guarantee ordering for a specific key

Value

•The content of data goes

 

Consumer

Consumers read data from a topic

•They only have to specify the topic name and one broker to connect to, and Kafka will automatically take care of pulling the data from the right brokers

•Data is read in order for each partitions

 

Partitions Count

•Roughly, each partition can get a throughput of 10 MB / sec

•More partitions implies :

•Better parallelism, better throughput

•BUT more files opened on your system

•BUT if a broker fails (unclean shutdown), lots of concurrent leader elections

•BUT added latency to replicate (in the order of milliseconds)

•Guidelines:

•Partitions per topic = (1 to 2) x (# of brokers), max 10 partitions

•Example: in a 3 brokers setup, 3 or 6 partitions is a good number to start with

 

Replication Factor

•Should be at least 2, maximum of 3

•The higher the replication factor:

•Better resilience of your system (N-1 brokers can fail)

•BUT longer replication (higher latency is acks=all)

•BUT more disk space on your system (50% more if RF is 3 instead of 2)

•Guidelines:

Set it to 2(if you have 3 brokers)

Set it to 3 (if you have greater than 5 brokers)

•If replication performance is an issue, get a better broker instead of less replication factor

Partitions and Segments

Topics are made of partitions (we already know that)

Partitions are made of … segments(files)!

•Only one segment is ACTIVE (the one data is being written to)

 

Segments and Indexes

•Segments come with two indexes (files):

•An offset to position index: allows Kafka where to read to find a message

•A timestamp to offset index: allow Kafka to find messages with a timestamp

•Therefore, Kafka knows where to find data in a constant time!

 

猜你喜欢

转载自www.cnblogs.com/fangjb/p/13161949.html