Kafka 参数调优

Kafka 生产调优参数:
Producer:
acks: all

buffer.memory: 536870912

compression.type :snappy

retries: 100
max.in.flight.requests.per.connection = 1

batch.size: 10000 字节不是条数
max.request.size = 2097152
request.timeout.ms = 360000 大于 replica.lag.time.max.ms
metadata.fetch.timeout.ms= 360000
timeout.ms = 360000

linger.ms 5s (生产不用)

max.block.ms 1800000

Broker: CDH
message.max.bytes 2560KB 1条消息的大小
zookeeper.session.timeout.ms 180000
replica.fetch.max.bytes 5M 大于message.max.bytes
num.replica.fetchers 6
replica.lag.max.messages 6000
replica.lag.time.max.ms 15000

log.flush.interval.messages 10000
log.flush.interval.ms 5s

Consumer:
https://issues.apache.org/jira/browse/SPARK-22968
, "max.partition.fetch.bytes" -> (5242880: java.lang.Integer) //default: 1048576
, "request.timeout.ms" -> (90000: java.lang.Integer) //default: 60000
, "session.timeout.ms" -> (60000: java.lang.Integer) //default: 30000
, "heartbeat.interval.ms" -> (5000: java.lang.Integer)
, "receive.buffer.bytes" -> (10485760: java.lang.Integer)

Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:

Broker: No changes, you still need to increase properties message.max.bytes
and replica.fetch.max.bytes. message.max.bytes has to be equal or smaller(*) than
replica.fetch.max.bytes.
Producer: Increase max.request.size to send the larger message.
Consumer: Increase max.partition.fetch.bytes to receive larger messages.
(*) Read the comments to learn more about message.max.bytes<=replica.fetch.max.bytes

2.消费者的值
ConsumerRecord(
topic = onlinelogs, partition = 0,
offset = 1452002, CreateTime = -1, checksum = 3849965367,
serialized key size = -1, serialized value size = 305,

key = null,
value = {"hostname":"yws76","servicename":"namenode",
"time":"2018-03-21 20:11:30,090","logtype":"INFO",
"loginfo":
"org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
Finalizing edits file /dfs/nn/current/edits_inprogress_0000000000001453017 -> /dfs/nn/current/edits_0000000000001453017-0000000000001453030"})

2.1解释前面讲的曲线图
2.2 key=null;
分区策略，

Key is not null: Utils.abs(key.hashCode) % numPartitions
key=null:
http://www.2bowl.info/kafka%E6%BA%90%E7%A0%81%E8%A7%A3%E8%AF%BB-key%E4%B8%BAnulll%E6%97%B6kafka%E5%A6%82%E4%BD%95%E9%80%89%E6%8B%A9%E5%88%86%E5%8C%BApartition/

3.1
记录自定义kafka的parcel库,CDH安装kafka服务,无法安装过去的排雷过程
http://blog.itpub.net/30089851/viewspace-2136372/

3.2
断电，导致Kafka的Topic的损坏
现象: CDH web界面,Kafka进程绿色，我们一般认为绿色就是进程ok，其实不然
生产者和消费者无法work，抛exception

流程:
去机器上看broker日志
kafka.common.NotAssignedReplicaException:
Leader 186 failed to record follower 191's position -1
since the replica is not recognized to be one of the assigned replicas 186
for partition [__consumer_offsets,3].

1.服务down,broker节点的kafka log目录删除
2.zk的kafka的元数据
3.重新装个kafka和topic

思考：
1.重刷，数据重复怎么办？
HBase put api（insert+update）

2.假如数据是落在HDFS，思考？
Hive 支持update，从哪个版本？加什么参数？

3.分区内保证排序的，多个分区怎样保证排序？ 0.11版本
insert
delete
insert --> delete
delete --> insert

猜你喜欢