Reprinted from https://www.cnblogs.com/hei12138/p/7805475.html
kafka介绍
1.1. Main functions
According to the introduction of the official website, ApacheKafka® is a distributed streaming media platform with three main functions:
1: It lets you publish and subscribe to streams of records. Publish and subscribe to message streams. This function is similar to message queues, which is why kafka is classified as a message queue framework
2: It lets you store streams of records in a fault-tolerant way. To record message streams in a fault-tolerant way, kafka stores message streams as files
3: It lets you process streams of records as they occur. It can be processed when the message is published
1.2. Usage scenarios
1: Building real-time streaming data pipelines that reliably get data between systems or applications. Building reliable pipelines for transmitting real-time data between systems or applications, message queuing function
2: Building real-time streaming applications that transform or react to the streams of data. Build real-time streaming data processing programs to transform or process data streams, data processing functions
1.3. Details
Kafka is currently mainly used as a distributed publish-subscribe messaging system. The following briefly introduces the basic mechanism of Kafka
1.3.1 Message Transmission Process
Producer is the producer, which sends messages to the Kafka cluster. Before sending the message, it classifies the message, that is, topic. The above figure shows that two producers send messages classified as topic1, and the other one sends messages classified as topic2.
Topic is a topic. By assigning a topic to a message, the message can be classified, and consumers can only pay attention to the messages in the topic they need.
Consumer is the consumer. The consumer continuously pulls messages from the cluster by establishing a long connection with the kafka cluster, and then can process these messages.
It can be seen from the above figure that the number of consumers and producers under the same topic does not correspond.
1.3.2 Kafka server message storage strategy
When it comes to the storage of kafka, we have to mention partitions, namely partitions. When creating a topic, you can specify the number of partitions at the same time. The more partitions, the greater the throughput, but the more resources are required, and the This leads to higher unavailability. After Kafka receives the message sent by the producer, it will store the message in different partitions according to the balancing strategy.
Within each partition, messages are stored sequentially, with the latest received message being consumed last.
1.3.3 Interaction with producers
When the producer sends a message to the kafka cluster, it can send it to the specified partition by specifying the partition
It is also possible to send messages to different partitions by specifying a balancing strategy
If not specified, the default random balancing strategy will be used to randomly store messages in different partitions
1.3.4 Interaction with consumers
When consumers consume messages, Kafka uses offset to record the current consumption position
In the design of kafka, there can be multiple different groups to consume messages under the same topic at the same time. As shown in the figure, we have two different groups to consume at the same time, and their consumption records have different offsets, not each other. interference.
For a group, the number of consumers should not exceed the number of partitions, because in a group, each partition can only be bound to at most one consumer, that is, a consumer can consume multiple partitions, and a partition can only can be consumed by a consumer
Therefore, if the number of consumers in a group is greater than the number of partitions, the redundant consumers will not receive any messages.
Kafka安装与使用
2.1. Download
You can download the latest kafka installation package on the kafka official website http://kafka.apache.org/downloads, and choose to download the binary version of the tgz file. Depending on the network status, fq may be required. The version we choose here is 0.11.0.1. latest version of
2.2. Installation
Kafka is a program written in scala that runs on a jvm virtual machine. Although it can also be used on windows, kafka basically runs on a linux server, so we also use linux here to start today's actual combat.
First make sure that jdk is installed on your machine, kafka needs java running environment, the previous kafka also needs zookeeper, the new version of kafka has a built-in zookeeper environment, so we can use it directly
It is said to be installed. If we only need to make the simplest attempt, we only need to extract it to any directory. Here we will extract the kafka compressed package to the /home directory
2.3. Configuration
There is a config folder in the kafka decompression directory, where our configuration files are placed
consumer.properites consumer configuration, this configuration file is used to configure the consumers opened in Section 2.5, here we can use the default
producer.properties producer configuration, this configuration file is used to configure the producer enabled in Section 2.5, here we can use the default
server.properties kafka server configuration, this configuration file is used to configure kafka server, only a few basic configurations are introduced at present
broker.id 申明当前kafka服务器在集群中的唯一ID,需配置为integer,并且集群中的每一个kafka服务器的id都应是唯一的,我们这里采用默认配置即可
listeners 申明此kafka服务器需要监听的端口号,如果是在本机上跑虚拟机运行可以不用配置本项,默认会使用localhost的地址,如果是在远程服务器上运行则必须配置,例如:
listeners=PLAINTEXT://192.168.180.128:9092. And make sure the server's port 9092 can be accessed
3.zookeeper.connect declares the address of the zookeeper to which kafka is connected, which needs to be configured as the address of the zookeeper. Since the zookeeper in the high version of kafka is used this time, the default configuration can be used.
zookeeper.connect=localhost:2181
2.4. Run
启动zookeeper
cd into the kafka decompression directory and enter
bin/zookeeper-server-start.sh config/zookeeper.properties
After successfully starting zookeeper, you will see the following output
2. Start kafka
cd into the kafka decompression directory and enter
bin/kafka-server-start.sh config/server.properties
After starting kafka successfully, you will see the following output
2.5. The first message
2.5.1 Create a topic
Kafka manages the same type of data through topics, and the same type of data can use the same topic to process data more conveniently
Open a terminal in the kafka decompression directory and enter
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Create a topic named test
在创建topic后可以通过输入
bin/kafka-topics.sh --list --zookeeper localhost:2181
to view the topics that have been created
2.4.2 Create a message consumer
Open a terminal in the kafka decompression directory and enter
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
You can create a consumer for consuming topic test
消费者创建完成之后,因为还没有发送任何数据,因此这里在执行后没有打印出任何数据
不过别着急,不要关闭这个终端,打开一个新的终端,接下来我们创建第一个消息生产者
2.4.3 Create a message producer
Open a new terminal in the kafka decompression directory and enter
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
The editor page that will be entered after execution
After sending the message, you can go back to our message consumer terminal. You can see that the message we just sent has been printed out in the terminal.
使用java程序
As in the previous section, we are now trying to use kafka in our java program
3.1 Create Topic
public static void main(String[] args) {
//创建topic
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.180.128:9092");
AdminClient adminClient = AdminClient.create(props);
ArrayList
NewTopic newTopic = new NewTopic("topic-test", 1, (short) 1);
topics.add(newTopic);
CreateTopicsResult result = adminClient.createTopics(topics);
try {
result.all().get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
Use the AdminClient API to control the configuration of the kafka server. We use the NewTopic(String name, int numPartitions, short replicationFactor) construction method to create a "topic-test", the number of partitions is 1, and the replication factor is 1 Topic.
3.2 Producer producers send messages
public static void main(String[] args){
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.180.128:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
for (int i = 0; i < 100; i++)
producer.send(new ProducerRecord<String, String>("topic-test", Integer.toString(i), Integer.toString(i)));
producer.close();
}
After using the producer to send a message, you can listen to the message through the server-side consumer mentioned in 2.5. You can also use the java consumer program described next to consume messages
3.3 Consumer consumes messages
public static void main(String[] args){
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.12.65:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
final KafkaConsumer
consumer.subscribe(Arrays.asList("topic-test"),new ConsumerRebalanceListener() {
public void onPartitionsRevoked(Collection
}
public void onPartitionsAssigned(Collection
//Set the offset to the beginning
consumer.seekToBeginning(collection);
}
});
while (true) {
ConsumerRecords
for (ConsumerRecord
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}
Here we use the Consume API to create a common java consumer program to listen to the topic named "topic-test". Whenever a producer sends a message to the kafka server, our consumer can receive the sent message.
使用spring-kafka
Spring-kafka is a spring sub-project that is in the incubation stage. It can use the features of spring to make it easier for us to use kafka
4.1 Basic configuration information
Like other spring projects, configuration is always inseparable. Here we use java configuration to configure our kafka consumers and producers.
引入pom文件
创建配置类
We create a new class named KafkaConfig in the main directory
@Configuration
@EnableKafka
public class KafkaConfig {
}
配置Topic
Add configuration in kafkaConfig class
//topic config Topic configuration starts
@Bean
public KafkaAdmin admin() {
Map
configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.180.128:9092");
return new KafkaAdmin(configs);
}
@Bean
public NewTopic topic1() {
return new NewTopic("foo", 10, (short) 2);
}
//topic configuration ends
配置生产者Factort及Template
//producer config start
@Bean
public ProducerFactory
return new DefaultKafkaProducerFactory
}
@Bean
public Map
Map
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.180.128:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
return props;
}
@Bean
public KafkaTemplate
return new KafkaTemplate
}
//producer config end
5. Configure ConsumerFactory
//consumer config start
@Bean
public ConcurrentKafkaListenerContainerFactory
ConcurrentKafkaListenerContainerFactory
factory.setConsumerFactory(consumerFactory());
return factory;
}
@Bean
public ConsumerFactory<Integer,String> consumerFactory(){
return new DefaultKafkaConsumerFactory<Integer, String>(consumerConfigs());
}
@Bean
public Map<String,Object> consumerConfigs(){
HashMap<String, Object> props = new HashMap<String, Object>();
props.put("bootstrap.servers", "192.168.180.128:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
return props;
}
//consumer config end
4.2 Create a message producer
//Use spring-kafka's template to send a message and send multiple messages only need to loop multiple times
public static void main(String[] args) throws ExecutionException, InterruptedException {
AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext(KafkaConfig.class);
KafkaTemplate
String data="this is a test message";
ListenableFuture
send.addCallback(new ListenableFutureCallback
public void onFailure(Throwable throwable) {
}
public void onSuccess(SendResult<Integer, String> integerStringSendResult) {
}
});
}
4.3 Create a message consumer
We first create a class for message listening. When the topic named "topic-test" receives a message, our listen method will be called.
public class SimpleConsumerListener {
private final static Logger logger = LoggerFactory.getLogger(SimpleConsumerListener.class);
private final CountDownLatch latch1 = new CountDownLatch(1);
@KafkaListener(id = "foo", topics = "topic-test")
public void listen(byte[] records) {
//do something here
this.latch1.countDown();
}
}
我们同时也需要将这个类作为一个Bean配置到KafkaConfig中
@Bean
public SimpleConsumerListener simpleConsumerListener(){
return new SimpleConsumerListener();
}
By default spring-kafka will create a thread for each listening method to pull messages from the kafka server