1 Overview
Apache Kafka is a distributed high-throughput streaming messaging system built on the ZooKeeper synchronization service. It integrates perfectly with Apache Storm and Spark for real-time streaming data analysis, compared to other messaging systems, Kafka has better throughput, built-in partitioning, data replication and high fault tolerance, so it is ideal for large message processing application scenarios .
For an introduction to Kafka architecture, please check: https://my.oschina.net/feinik/blog/1806488
2 Deployment Diagram
3 Environment preparation before Kafka cluster deployment
3.1 Install Java
It is recommended to install Java 8, please install it yourself.
3.2 Deploy the Zookeeper cluster
3.2.1 Download the Zookeeper installation package
The zk version deployed here is: zookeeper-3.4.9.tar.gz
3.2.2 Installation
1. First install in server1
(1) Decompression: tar -zxvf zookeeper-3.4.9.tar.gz
(2)cd zookeeper-3.4.9/conf
(3)cp zoo_sample.cfg zoo.cfg
(4) Modify the zoo.cfg configuration file as follows
tickTime=2000
# zk数据目录
dataDir=/home/hadoop/app/zookeeper/data
# 客户端连接端口配置
clientPort=2181
initLimit=10
syncLimit=5
# 服务地址,2888为集群内个节点通信的端口,3888为leader选举时使用的端口
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
2. Copy the same copy of zookeeper-3.4.9 to server2 and server3 servers
3. Configure the environment variables of Zookeeper and start them separately to complete the deployment of the zk cluster
4 Deploy Kafka cluster
4.1 Install and configure
The version installed here is: kafka_2.12-1.1.0.tgz
Note: Install in server1 first, and then copy a copy to server2, server3 server
(1) Decompression
$tar -zxvf kafka_2.12-1.1.0.tgz -C /home/app
(2) Rename
$mv kafka_2.12-1.1.0 kafka
(3) Configure the environment variables of Kafka
(4) Modify the Kafka configuration file server.properties and modify the following configuration items
- Modify the broker (agent) id identifier, which needs to be unique in the cluster
broker.id=1
- Modify the log storage directory configuration
log.dirs=/home/app/kafka/log-data
- Modify the connection address of Zookeeper. Kafka comes with Zookeeper, but here we configure it as our own zk cluster address
zookeeper.connect=server1:2181,server2:2181,server3:2181
(5) Copy the kafka package deployed in server1 to server2 and server3
(6) Modify the server.properties configuration file of kafka in server2
broker.id=2
(7) Modify the server.properties configuration file of kafka in server3
broker.id=3
5 Start the cluster
5.1 Start the Zookeeper cluster first
Use the following commands to start server1, server2, and server3 respectively
$zkServer.sh start
Note: You can also start the Zookeeper cluster through a script, provided that you need to configure passwordless login, the script content is as follows
#!/bin/bash if(( $# != 1 )) ; then echo "Usage: zk.sh {start|stop}"; exit; fi cuser=`whoami`; for i in {server1,server2,server3}; do echo ---------- $i---------------; ssh $cuser@$i "cd /home/app/zookeeper; ./bin/zkServer.sh $@"; done
5.2 Start the Kafka cluster
Use the following commands to start server1, server2, and server3 respectively
$kafka-server-start.sh -daemon /home/app/kafka/config/server.properties
Note: You can also start the Kafka cluster through a script. The script content is as follows
#!/bin/bash cuser=`whoami`; for i in {server1,server2,server3}; do echo ---------- $i--------------; ssh $cuser@$i "/home/app/kafka/bin/kafka-server-start.sh -daemon /home/app/kafka/config/server.properties"; echo "start complate!" done
5.3 View cluster startup status
Use the jps command to view the service startup process. Server1, server2, and server3 all contain Kafka and QuorumPeerMain service processes, indicating that the cluster is successfully started.
$jps
5506 Kafka
5733 Jps
5212 QuorumPeerMain
6 Kafka Java API Question
6.1 The producer sends a message
public class ProducerClient {
private Producer<String, String> producer;
@Before
public void init() {
Properties props = new Properties();
/**
* broker地址列表,无需指定集群中的所有broker地址,Producer会从给定的broker中找到其他broker的地址信息,
* 推荐这里配置两个,可以防止broker宕机产生无法连接的问题
*/
props.put("bootstrap.servers", "server1:9092,server2:9092");
/**
* 指定key的序列化方式,Kafka 默认提供了常用的几种Java对象类型的序列化类
*/
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
producer = new KafkaProducer<>(props);
}
@Test
public void send() throws Exception {
//此处未指定key,那么发送的多条消息会被均匀的分布在Topic的所有可用分区中
ProducerRecord<String, String> record = new ProducerRecord<>("test",
"hello word");
//消息的异步发送
producer.send(record, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
System.out.println("消息发送完成!");
}
});
}
@After
public void close() {
producer.close();
}
}
Note: There are three ways to send messages: synchronous sending, asynchronous sending, fire-and-forget (do not care about the sending result after sending)
Synchronous sending: After calling the send method, return the Future object, and synchronously wait for the sending result of the message by calling the Future's get method.
Asynchronous sending: specify a callback function when calling the send method, and the broker will call back the function after receiving the success message
fire-and-forget: After calling the send method, it does not care about the sent result processing
6.2 Consumers subscribe and consume messages
public class ConsumerClient {
private Consumer<String, String> consumer;
@Before
public void init() {
Properties props = new Properties();
props.put("bootstrap.servers", "server1:9092,server2:9092");
//指定消费者群组标识
props.put("group.id", "g1");
//key与value的反序列化器
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<>(props);
}
@Test
public void consume() {
//订阅主题为test的消息
consumer.subscribe(Collections.singletonList("test"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
String value = record.value();
System.out.println("接收到消息:" + value);
}
}
}
@After
public void close() {
consumer.close();
}
}
7 Summary
This article mainly introduces the distributed cluster deployment method of Kafka, as well as the cluster deployment of Zookeeper, a third-party component that Kafka relies on. Finally, the Kafka Java API is used to demonstrate the sample code for producers to send messages and consumers to consume messages, and other uses of Kafka For details, please refer to the official website: http://kafka.apache.org/