Getting started with kafka

kafka combat

1. Introduction to Kafka

 

1.1. Main functions

According to the introduction of the official website, ApacheKafka® is a distributed streaming media platform with three main functions:

  1: It lets you publish and subscribe to streams of records. Publish and subscribe to message streams. This function is similar to message queues, which is why kafka is classified as a message queue framework

  2: It lets you store streams of records in a fault-tolerant way. To record message streams in a fault-tolerant way, kafka stores message streams as files

  3: It lets you process streams of records as they occur. It can be processed when the message is published

 

 

1.2. Usage scenarios

1: Building real-time streaming data pipelines that reliably get data between systems or applications. Building reliable pipelines for transmitting real-time data between systems or applications, message queuing function

2: Building real-time streaming applications that transform or react to the streams of data. Build real-time streaming data processing programs to transform or process data streams, data processing functions

 

1.3. Details

Kafka is currently mainly used as a distributed publish-subscribe messaging system. The following briefly introduces the basic mechanism of Kafka

  1.3.1 Message Transmission Process

 

    Producer is the producer, which sends messages to the Kafka cluster. Before sending the message, it classifies the message, that is, topic. The above figure shows that two producers send messages classified as topic1, and the other one sends messages classified as topic2.

    Topic is a topic. By assigning a topic to a message, the message can be classified, and consumers can only pay attention to the messages in the topic they need.

    Consumer is the consumer. The consumer continuously pulls messages from the cluster by establishing a long connection with the kafka cluster, and then can process these messages.

    It can be seen from the above figure that the number of consumers and producers under the same topic does not correspond.

  1.3.2 Kafka server message storage strategy

 

    When it comes to the storage of kafka, we have to mention partitions, namely partitions. When creating a topic, you can specify the number of partitions at the same time. The more partitions, the greater the throughput, but the more resources are required, and the This leads to higher unavailability. After Kafka receives the message sent by the producer, it will store the message in different partitions according to the balancing strategy.

 

  Within each partition, messages are stored sequentially, with the latest received message being consumed last.

  1.3.3 Interaction with producers

 

    When the producer sends a message to the kafka cluster, it can send it to the specified partition by specifying the partition

    It is also possible to send messages to different partitions by specifying a balancing strategy

    If not specified, the default random balancing strategy will be used to randomly store messages in different partitions

  1.3.4 Interaction with consumers

  

    When consumers consume messages, Kafka uses offset to record the current consumption position

    In the design of kafka, there can be multiple different groups to consume messages under the same topic at the same time. As shown in the figure, we have two different groups to consume at the same time, and their consumption records have different offsets, not each other. interference.

    For a group, the number of consumers should not exceed the number of partitions, because in a group, each partition can only be bound to at most one consumer, that is, a consumer can consume multiple partitions, and a partition can only can be consumed by a consumer

    Therefore, if the number of consumers in a group is greater than the number of partitions, the redundant consumers will not receive any messages.

2. Kafka installation and use

 

2.1. Download

  You can download the latest kafka installation package on the kafka official website http://kafka.apache.org/downloads , and choose to download the binary version of the tgz file. Depending on the network status, fq may be required. The version we choose here is 0.11.0.1. latest version of

 

2.2. Installation

  Kafka is a program written in scala that runs on a jvm virtual machine. Although it can also be used on windows, kafka basically runs on a linux server, so we also use linux here to start today's actual combat.

  First make sure that jdk is installed on your machine, kafka needs a java running environment, the previous kafka also needs zookeeper, the new version of kafka has a built-in zookeeper environment, so we can use it directly

  It is said to be installed. If we only need to make the simplest attempt, we only need to extract it to any directory. Here we will extract the kafka compressed package to the /home directory

 

2.3. Configuration

  There is a config folder in the kafka decompression directory, where our configuration files are placed

  consumer.properites consumer configuration, this configuration file is used to configure the consumers opened in Section 2.5, here we can use the default

  producer.properties producer configuration, this configuration file is used to configure the producer enabled in Section 2.5, here we can use the default

  server.properties kafka server configuration, this configuration file is used to configure kafka server, only a few basic configurations are introduced at present

    1. broker.id declares the unique ID of the current kafka server in the cluster, which needs to be configured as integer, and the id of each kafka server in the cluster should be unique. We can use the default configuration here.
    2. listeners declares the port number that this kafka server needs to listen to. If you are running a virtual machine on the local machine, you do not need to configure this item. By default, the address of localhost will be used. If it is running on a remote server, you must configure it, for example:

          listeners=PLAINTEXT://192.168.180.128:9092. And make sure the server's port 9092 can be accessed

      3.zookeeper.connect declares the address of the zookeeper connected to kafka, which needs to be configured as the address of the zookeeper. Since the zookeeper in the high version of kafka is used this time, the default configuration can be used.

          zookeeper.connect=localhost:2181

2.4. Running

  1. start zookeeper

cd into the kafka decompression directory and enter

bin/zookeeper-server-start.sh config/zookeeper.properties

启动zookeeper成功后会看到如下的输出

    2. Start kafka

cd into the kafka decompression directory and enter

bin/kafka-server-start.sh config/server.properties

启动kafka成功后会看到如下的输出

 

2.5. The first message

   2.5.1 Create a topic

    Kafka manages the same type of data through topics, and the same type of data can use the same topic to process data more conveniently

    Open a terminal in the kafka decompression directory and enter

    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

    创建一个名为test的topic

 

         After creating a topic, you can enter

            bin/kafka-topics.sh --list --zookeeper localhost:2181

   来查看已经创建的topic

  2.4.2   创建一个消息消费者

   在kafka解压目录打开终端,输入

    bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

   可以创建一个用于消费topic为test的消费者

 

 

         After the consumer is created, because no data has been sent, no data is printed after execution.

         But don't worry, don't close this terminal, open a new one, then we create the first message producer

  2.4.3 Create a message producer

    Open a new terminal in the kafka decompression directory and enter

    bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

    在执行完毕后会进入的编辑器页面

 

After sending the message, you can go back to our message consumer terminal. You can see that the message we just sent has been printed out in the terminal.

 

3. Using java program

    As in the previous section, we are now trying to use kafka in our java program

    3.1 Create Topic

public static void main(String[] args) {
    //创建topic
    Properties props = new Properties();
    props.put(“bootstrap.servers”, “192.168.180.128:9092”);
    AdminClient adminClient = AdminClient.create(props);
    ArrayList<NewTopic> topics = new ArrayList<NewTopic>();
    NewTopic newTopic = new NewTopic(“topic-test”, 1, (short) 1);
    topics.add(newTopic);
    CreateTopicsResult result = adminClient.createTopics(topics);
    try {
        result.all().get();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (ExecutionException e) {
        e.printStackTrace();
    }
}

  Use the AdminClient API to control the configuration of the kafka server. We use the NewTopic(String name, int numPartitions, short replicationFactor) construction method to create a "topic-test", the number of partitions is 1, and the replication factor is 1 Topic.

3.2 Producer producers send messages

public static void main(String[] args){
    Properties props = new Properties();
    props.put(“bootstrap.servers”, “192.168.180.128:9092”);
    props.put(“acks”, “all”);
    props.put(“retries”, 0);
    props.put(“batch.size”, 16384);
    props.put(“linger.ms”, 1);
    props.put(“buffer.memory”, 33554432);
    props.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
    props.put(“value.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);

    Producer<String, String> producer = new KafkaProducer<String, String>(props);
    for (int i = 0; i < 100; i++)
        producer.send(new ProducerRecord<String, String>(“topic-test”, Integer.toString(i), Integer.toString(i)));

    producer.close();

}

After using the producer to send a message, you can listen to the message through the server-side consumer mentioned in 2.5. You can also use the java consumer program described next to consume messages

3.3 Consumer consumes messages

public static void main(String[] args){
    Properties props = new Properties();
    props.put(“bootstrap.servers”, “192.168.12.65:9092”);
    props.put(“group.id”, “test”);
    props.put(“enable.auto.commit”, “true”);
    props.put(“auto.commit.interval.ms”, “1000”);
    props.put(“key.deserializer”, “org.apache.kafka.common.serialization.StringDeserializer”);
    props.put(“value.deserializer”, “org.apache.kafka.common.serialization.StringDeserializer”);
    final KafkaConsumer<String, String> consumer = new KafkaConsumer<String,String>(props);
    consumer.subscribe(Arrays.asList(“topic-test”),new ConsumerRebalanceListener() {
        public void onPartitionsRevoked(Collection<TopicPartition> collection) {
        }
        public void onPartitionsAssigned(Collection<TopicPartition> collection) {
            //将偏移设置到最开始
            consumer.seekToBeginning(collection);
        }
    });
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(100);
        for (ConsumerRecord<String, String> record : records)
            System.out.printf(“offset = %d, key = %s, value = %s%n”, record.offset(), record.key(), record.value());
    }
}

Here we use the Consume API to create a common java consumer program to listen to the topic named "topic-test". Whenever a producer sends a message to the kafka server, our consumer can receive the sent message.

4. Using spring-kafka

Spring-kafka is a spring sub-project that is in the incubation stage. It can use the features of spring to make it easier for us to use kafka

4.1 Basic configuration information

Like other spring projects, configuration is always inseparable. Here we use java configuration to configure our kafka consumers and producers.

  1. import pom file

<!–kafka start–>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>0.11.0.1</version>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-streams</artifactId>
    <version>0.11.0.1</version>
</dependency>
<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
    <version>1.3.0.RELEASE</version>
</dependency>

  1. Create configuration class

We create a new class named KafkaConfig in the main directory

@Configuration
@EnableKafka
public class KafkaConfig {

}

  1. Configure Topics

Add configuration in kafkaConfig class

//topic config Topic的配置开始
    @Bean
    public KafkaAdmin admin() {
        Map<String, Object> configs = new HashMap<String, Object>();
        configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,”192.168.180.128:9092”);
        return new KafkaAdmin(configs);
    }

    @Bean
    public NewTopic topic1() {
        return new NewTopic(“foo”, 10, (short) 2);
    }
//topic的配置结束

 

  1. Configure the producer Factort and Template

//producer config start
    @Bean
    public ProducerFactory<Integer, String> producerFactory() {
        return new DefaultKafkaProducerFactory<Integer,String>(producerConfigs());
    }
    @Bean
    public Map<String, Object> producerConfigs() {
        Map<String, Object> props = new HashMap<String,Object>();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, “192.168.180.128:9092”);
        props.put(“acks”, “all”);
        props.put(“retries”, 0);
        props.put(“batch.size”, 16384);
        props.put(“linger.ms”, 1);
        props.put(“buffer.memory”, 33554432);
        props.put(“key.serializer”, “org.apache.kafka.common.serialization.IntegerSerializer”);
        props.put(“value.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
        return props;
    }
    @Bean
    public KafkaTemplate<Integer, String> kafkaTemplate() {
        return new KafkaTemplate<Integer, String>(producerFactory());
    }
//producer config end

5. Configure ConsumerFactory

//consumer config start
    @Bean
    public ConcurrentKafkaListenerContainerFactory<Integer,String> kafkaListenerContainerFactory(){
        ConcurrentKafkaListenerContainerFactory<Integer, String> factory = new ConcurrentKafkaListenerContainerFactory<Integer, String>();
        factory.setConsumerFactory(consumerFactory());
        return factory;
    }

    @Bean
    public ConsumerFactory<Integer,String> consumerFactory(){
        return new DefaultKafkaConsumerFactory<Integer, String>(consumerConfigs());
    }


    @Bean
    public Map<String,Object> consumerConfigs(){
        HashMap<String, Object> props = new HashMap<String, Object>();
        props.put(“bootstrap.servers”, “192.168.180.128:9092”);
        props.put(“group.id”, “test”);
        props.put(“enable.auto.commit”, “true”);
        props.put(“auto.commit.interval.ms”, “1000”);
        props.put(“key.deserializer”, “org.apache.kafka.common.serialization.IntegerDeserializer”);
        props.put(“value.deserializer”, “org.apache.kafka.common.serialization.StringDeserializer”);
        return props;
    }
//consumer config end

 

 

4.2 Create a message producer

//使用spring-kafka的template发送一条消息 发送多条消息只需要循环多次即可
public static void main(String[] args) throws ExecutionException, InterruptedException {
    AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext(KafkaConfig.class);
    KafkaTemplate<Integer, String> kafkaTemplate = (KafkaTemplate<Integer, String>) ctx.getBean(“kafkaTemplate”);
        String data=”this is a test message”;
        ListenableFuture<SendResult<Integer, String>> send = kafkaTemplate.send(“topic-test”, 1, data);
        send.addCallback(new ListenableFutureCallback<SendResult<Integer, String>>() {
            public void onFailure(Throwable throwable) {

            }

            public void onSuccess(SendResult<Integer, String> integerStringSendResult) {

            }
        });
}

 

4.3 Create a message consumer

We first create a class for message listening. When the topic named "topic-test" receives a message, our listen method will be called.

public class SimpleConsumerListener {
    private final static Logger logger = LoggerFactory.getLogger(SimpleConsumerListener.class);
    private final CountDownLatch latch1 = new CountDownLatch(1);

    @KafkaListener(id = “foo”, topics = “topic-test”)
    public void listen(byte[] records) {
        //do something here
        this.latch1.countDown();
    }
}

         We also need to configure this class as a Bean into KafkaConfig

@Bean
public SimpleConsumerListener simpleConsumerListener(){
    return new SimpleConsumerListener();
}

By default spring-kafka will create a thread for each listening method to pull messages from the kafka server

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324790474&siteId=291194637