上篇文章主要讲解了produce源码中元数据更新相关内容,本次主要讲解produce的分区和拦截器。
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
TopicPartition tp = null;
try {
// first make sure the metadata for the topic is available
ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
Cluster cluster = clusterAndWaitTime.cluster;
byte[] serializedKey;
try {
serializedKey = keySerializer.serialize(record.topic(), record.key());
} catch (ClassCastException cce) {
throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
" to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
" specified in key.serializer");
}
byte[] serializedValue;
try {
serializedValue = valueSerializer.serialize(record.topic(), record.value());
} catch (ClassCastException cce) {
throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
" to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
" specified in value.serializer");
}
int partition = partition(record, serializedKey, serializedValue, cluster);
int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
ensureValidRecordSize(serializedSize);
tp = new TopicPartition(record.topic(), partition);
long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
// producer callback will make sure to call both 'callback' and interceptor callback
Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
if (result.batchIsFull || result.newBatchCreated) {
log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
this.sender.wakeup();
}
return result.future;
从dosend中可以看到,ClusterAndWaitTime之后,就是serializedKey和serializedValue,序列化要传送到kafka里面的key和value,这里存在一个问题,我们发送代码producer.send(new ProducerRecord<>(topic, UUID.randomUUID().toString(), String.valueOf(i)));中,自己指定了一个key即UUID.randomUUID().toString()。因为给你发送的一条消息判断它属于哪个分区是通过key来判断的,如果指定了key,肯定按照key的hash来作分区,如果不指定分区呢?继续看源码,看produce在指定和不指定key的情况下是如何给一条消息分区的。
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
int nextValue = nextValue(topic);
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = Utils.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
} else {
// no partitions are available, give a non-available partition
return Utils.toPositive(nextValue) % numPartitions;
}
} else {
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
private int nextValue(String topic) {
AtomicInteger counter = topicCounterMap.get(topic);
if (null == counter) {
counter = new AtomicInteger(new Random().nextInt());
AtomicInteger currentCounter = topicCounterMap.putIfAbsent(topic, counter);
if (currentCounter != null) {
counter = currentCounter;
}
}
return counter.getAndIncrement();
}
produce的分区中,默认使用的分区是org.apache.kafka.clients.producer.internals底下的DefaultPartitioner, 从当中可以看到,如果key为null的话,会调用nextValue(value)方法,产生一个key,这个key是递增的,再调用取余的方法指定这条消息发送到哪个分区。如果指定key的话,它会通过murmur2来指定分区。
其实在分区函数之前还有一个拦截器的概念,对,就是类似于flume的拦截器,再new kafkaproduce的时候会调用
this.interceptors = interceptorList.isEmpty() ? null : new ProducerInterceptors<>(interceptorList);
判断有没有拦截器,有的话则使用拦截器,没有则为null, 下面是自定义分区策略和拦截器的代码,仅供参考
//当key是"a"的时候,到1分区,否则随机指定一个,当key为null即不指定分区时,到0分区
public class MyPartition implements Partitioner {
@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if(keyBytes!=null){
if(key.equals("a")){
return 1;
}else{
return new Random().nextInt(numPartitions);
}
}else {
return 0;
}
}
@Override
public void close() {
}
@Override
public void configure(Map<String, ?> map) {
}
}
//自定义拦截器,发送的时候过滤掉key为奇数的,返回的时候,Topic为test且分区编号为0的消息的返回值进行输出
public class ProducerInterceptorDemo implements ProducerInterceptor<Integer,Integer> {
@Override
public ProducerRecord<Integer,Integer> onSend(ProducerRecord<Integer,Integer> producerRecord) {
if(producerRecord.key() % 2 ==0){
System.out.println("filter the jishu");
System.out.println(producerRecord.value());
return producerRecord;//过滤掉为奇数的消息
}
return null;
}
@Override
public void onAcknowledgement(RecordMetadata recordMetadata, Exception e) {
if(recordMetadata !=null && "test".equals(recordMetadata.topic())
&& recordMetadata.partition()==0){
//对于正常返回,Topic为test且分区编号为0的消息的返回值进行输出
System.out.println(recordMetadata.toString());
}else{
e.printStackTrace();
//System.out.println(recordMetadata.toString());
}
}
@Override
public void close() {
}
@Override
public void configure(Map<String, ?> map) {
}
}
未完待续.....