Kafka writes data

Developers can use the built-in Kafka client API to develop applications

Producers

Kafka an application to write messages: record a user's activity, recording metrics, save log messages, recording information intelligent appliances, asynchronous communication with other applications, etc.

Send message flow

  1. Create a ProducerRecord object begins, ProducerRecord object needs to contain the target topic, partition, key, value. When sending Producer object to producers and first key value into a byte array object serialization.
  2. Data is transmitted to the partition, a partition will select a partition to add records to the batch records, the batch will be sent to the same message and relating to the partition or partitions based hash keys.
  3. Server receives the message, it will return a response. Write success, returns a RecordMetaData object that contains the theme and partition information, and recording an offset, should return to zk. Write failure, an error is returned, it will re-send the message you receive an error after producer.

Creating Producer Code

private Properties kafkaProps = new Properties(); 
kafkaProps.put("bootstrap.servers", "broker1:9092,broker2:9092");
kafkaProps.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer"); 
kafkaProps.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);

Sending a message from the three ways

  • Send and forget. After the message is sent to the server does not care whether it is normal to reach
  • Synchronous transmission. Use send () sends, returns a Future object, call get () waits until the message can be sent successfully
  • Asynchronous Transmission. () Transmission, and specify a callback function to use send, the server returns a response to the calling function formula

Serializer

Create a producer object you must specify the serializer.

If the object is not transmitted to kafka simple string or integer, framework sequences may be used to create a message recording, such as Avro, Thrift, Protobuf, or use a custom serialization device.

Custom Serialization
//简单一个客户类
public class Customer {
    private int customerID;
    private String customerName;
    public Customer(int ID, String name) {
        this.customerID = ID;
        this.customerName = name;
    }
    public int getID() {
    return customerID;
    }
    public String getName() {
    return customerName;
    }
}

//创建序列化器
import org.apache.kafka.common.errors.SerializationException;
import java.nio.ByteBuffer;
import java.util.Map;

public class CustomerSerializer implements Serializer<Customer> {
@Override
    public void configure(Map configs, boolean isKey) {
    // 不做任何配置
    }
    @Override
    /**
    Customer对象被序列化成:
    表示customerID的4字节整数
    表示customerName长度的4字节整数(如果customerName为空,则长度为0)
    表示customerName的N个字节
    */
    public byte[] serialize(String topic, Customer data) {
    try {
    byte[] serializedName;
    int stringSize;
    if (data == null)
        return null;
    else {
        if (data.getName() != null) {
            serializedName = data.getName().getBytes("UTF-8");
            stringSize = serializedName.length;
        } else {
            serializedName = new byte[0];
            stringSize = 0;
        }
    }
    ByteBuffer buffer = ByteBuffer.allocate(4 + 4 + stringSize);
    buffer.putInt(data.getID());
    buffer.putInt(stringSize);
    buffer.put(serializedName);
    return buffer.array();
    } catch (Exception e) {
    throw new SerializationException("Error when serializing Customer to
byte[] " + e);
    }
}
    @Override
    public void close() {
    // 不需要关闭任何东西
    }
}
Use Avro Serialization

Data is serialized into a binary file or JSON file, Avro when reading and writing files need to use schema, schema will generally be embedded in the data file. Features, write information using the new schema, responsible for reading information may continue to be used without modification.

Partition

If the key bit NULL, Round Robin partition uses the message equally distributed to each partition, the key is not empty, then the hash key, mapped to the corresponding partition, a key is always the same in the same partition.

Custom partitioning strategy
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.record.InvalidRecordException;
import org.apache.kafka.common.utils.Utils;
public class BananaPartitioner implements Partitioner {
    public void configure(Map<String, ?> configs) {} 
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
    int numPartitions = partitions.size();
    if ((keyBytes == null) || (!(key instanceOf String))) ➋
    throw new InvalidRecordException("We expect all messages to have customer name as key")

    if (((String) key).equals("Banana"))
    return numPartitions; // Banana总是被分配到最后一个分区
    // 其他记录被散列到其他分区
    return (Math.abs(Utils.murmur2(keyBytes)) % (numPartitions - 1))
    }
    public void close() {}
}

Guess you like

Origin www.cnblogs.com/chenshaowei/p/12513709.html