[Kafka] Chapter 1
1. Course content
======> Course Learning
2. Introduction to Kafka
Traditional definition of Kafka: Kafka is a distributed message queue (Message Queue) based on the publish/subscribe model, which is mainly used in the field of real-time processing of big data.
======> What is distributed
To put it simply, distribution means splitting a big problem into multiple small problems, solving them one by one, and finally working together. The software system that works together to complete a specific task and supports distributed processing is what we call a system where multiple systems work together to complete a specific task. distributed system.
Publish/Subscribe: The publisher of a message will not send the message directly to specific subscribers, but will divide the published messages into different categories, and the subscribers will only receive the messages of interest.
The latest definition of Kafka: Kafka is an open source distributed event streaming platform (Event StreamingPlatform) used by thousands of companies for high-performance data pipelines, stream analysis, data integration and mission-critical applications.
2.1 Common message queues
At present, the more common message queue products in enterprises mainly include Kafka、ActiveMQ 、RabbitMQ 、RocketMQ
.
In big data scenarios, it is mainly used Kafka
as a message queue. Mainly used in JavaEE development ActiveMQ、RabbitMQ、RocketMQ
.
The main application scenarios of traditional message queues include: caching/peak elimination, decoupling and asynchronous communication.
2.2 Application scenarios of message queue
2.2.1 Decoupling
Decoupling: Allows you to extend or modify the processes on both sides independently, as long as they comply with the same interface constraints.
2.2.2 Asynchronous communication
Supplement: How to enable asynchronous use in java
2.3 Two modes of message queue
- Peer-to-peer mode
- publish/subscribe model
2.3.1 Peer-to-peer mode
2.3.2 Publish/subscribe model
2.4 Kafka infrastructure
(1) Producer: The message producer is the client that sends messages to the Kafka broker.
(2) Consumer: Message consumer, the client that obtains messages from the Kafka broker.
(3) Consumer Group (CG): Consumer group consists of multiple consumers. Each consumer in the consumer group is responsible for consuming data from different partitions. A partition can only be consumed by consumers in one group; consumer groups do not affect each other. All consumers belong to a certain consumer group, that is, the consumer group is a logical subscriber.
(4) Broker: A Kafka server is a broker. A cluster consists of multiple brokers. A broker can accommodate multiple topics.
(5)Topic: It can be understood as a queue, and both producers and consumers are oriented to the same topic.
(6) Partition: In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers). A topic can be divided into multiple partitions, and each partition is an ordered queue.
(7) Replica: copy. Each partition of a topic has several copies, a Leader and several Followers.
(8) Leader: The "master" of multiple copies of each partition, the object to which producers send data, and the object to which consumers consume data are all leaders.
(9) Follower: The "slave" among multiple copies of each partition synchronizes data from the Leader in real time and maintains synchronization with the Leader data. When the Leader fails, a Follower will become the new Leader.
3.Kafka Quick Start
======> Kafka download
Unzip the kafka tgz package to /opt/module
and then change the name.
View it from the config directoryserver.properties
Distribute kafka
configuration
distribution
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
Start kafka (provided Zookeeper has been started)
Kafka start and stop scripts
3.1 Theme command line operation
1. View the operation topic parameter command
[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh
2. View all topics in the current server
[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server hadoop102:9092 --list
3. Create the first topic
[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server hadoop102:9092 --create --partitions 1 --replication-factor 3 --topic first
option description:
–partitions defines the number of partitions
–replication-factor defines the number of replicas
–topic defines the topic name
3.2 Create a producer to send data
topic
Send data to
创建消费者
消费者接收到数据
消费者端开启历史数据
4. Kafka producer
======> Kafka producer
List of important parameters for producers
5. Asynchronous sending
5.1 Asynchronous sending API
// ALL