一：模拟实验：

1.1：如何查看生产者和消费者启动命令：

在这里插入图片描述

1.2 console启动消费者和消费者命令：

消费者是zk

bin/kafka-console-consumer.sh \
--zookeeper 172.17.4.16:2181,172.17.4.17:2181,172.17.217.124:2181/kafka \
--topic kunming \
--from-beginning

生产者是broker

bin/kafka-console-producer.sh \
--broker-list 172.17.4.16:9092,172.17.4.17:9092,172.17.217.124:9092 \
--topic kunming

1.3 无法确认是否好的时候，用这个测试是否正常：

在这里插入图片描述

二：kafka核心概念:

broker: 进程
producer: 生产者 flume
consumer: 消费者 ss
topic: 主题分区+副本数
partition: 分区

consumergroup：
在这里插入图片描述

2.1:offset:

Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.

offset: 有序的不可变的序列相当于MySQL自增长主键
每个分区记录offset 偏移量都是从1开始
topic：kunming

kunming-0文件夹 1.。。。。
kunming-1文件夹 1.。。。。
kunming-2文件夹 1.。。。。
在这里插入图片描述
命名规则: 第一组为00000000000000000000
第二组为00000000000000002000
名称是由上一组的最后一条消息的offset来命名

2.2：如何快速找到需要的消息：

比如：快速找到2800
00000000000000000000.index offset
00000000000000000000.log 1-2000

00000000000000002000.index
00000000000000002000.log 2001-4200

00000000000000004200.index
00000000000000004200.log 4201-…

index剖析：
00000000000000002000.index
1,0
3,22 3代表相对offset，第二个该offset的log文件的物理偏移量字节位置
8,66
300,99
500,777

过程
1.二分法小于等于2800的最大的offset的index文件为 00000000000000002000.index
2.2800-2000=800 相对offset
3.二分法小于等于800的最大的相对offset 500,777
4.通过777字节位置找到绝对offset2700的消息，按顺序查找2800的offset的对应消息

1.log文件记录message 全部offset
2.index文件记录相对offset和对应的消息的物理偏移量字节位置稀疏存储

相对offset和相对通过文件名转换就可以得到按顺序就可以找到需要的offset

三：消费语义：3类

1.at most once：最多消费一次消息0/1 消息可能丢失但是不会重复消费 -->log
2.at least once: 至少消费1次消息>=1 消费不可能丢失但是会重复消费 --> 大多都这
3.exactly once: 正好1次消息 1 不会丢失也不会重复但是代码维护、外部存储维护量大
外部存储offset zk、hbase、redis, 消费一条记录一次

四：全局有序如何做：

4.1：需求是这样：

在分区里面进来了这些数据：
insert 1 11:00 p0
update 2 11:02 p1
update 3 11:04 p1
update 5 11:07 p0

po :
insert 1
update 5
—》spark streaming insert 1 update 5 update 2 update 3 ===》3
p1：
update 2
update 3

业务: 1 2 3 5
消费: 1 5 2 3
业务系统数据比如ERP 这时候就紊乱了，如何来做有序呢？？

4.2：全局有序：

1.一个topic 一个分区 3个虽然保证全局有序，但是性能下降；
2.单分区有序，那么我们想方法把同一个特征数据写到一个分区
p0 p1 p2

id money
特征值
insert into t values(1,1) erp.t.1 hash 5 %3=1…2 -->p2
update t set age= 200 where id =1 erp.t.1
update t set age= 400 where id =1 erp.t.1
update t set age= 1000000 where id =1 erp.t.1
delete from t where id =1 erp.t.1

最终的结果的是0条

producer send api (key,value)
key: erp.t.1 null
value：SQL

https://github.com/apache/kafka/blob/3cdc78e6bb1f83973a14ce1550fe3874f7348b05/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java
这5条记录都发送到一个分区有序的发送，那么消费时也不会出现紊乱

4.3 调优参数：

在这里插入图片描述

五：监控kafka:

1.生产者和消费者的曲线趋势速度是一样的。
没有消息堆积及时的消费 ss–>hbase 么有压力

2.为什么消费者曲线比生产者曲线在同一时刻要高？
生产者
send(null,“hello”)
消费者 {
kunming： g5
partition: p2
offset: 10000
value： ,“hello”
}

"hello " ：15字节
{
topic： kunming
partition: p2
offset: 10000
timestamp: xxxxxx
length:
value： “helllo”
} : 60字节

十五：kafka核心实验和全局有序

一：模拟实验：

1.1：如何查看生产者和消费者启动命令：

1.2 console启动消费者和消费者命令：

1.3 无法确认是否好的时候，用这个测试是否正常：

二：kafka核心概念:

2.1:offset:

2.2：如何快速找到需要的消息：

三：消费语义：3类

四：全局有序如何做：

4.1：需求是这样：

4.2：全局有序：

4.3 调优参数：

五：监控kafka:

猜你喜欢