kafka数据存储格式

转自:http://www.hemingliang.site/308.html

阅读目录

查看主题数据分布

[hadoop@m2 kafka_2.10-0.10.2.1]$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
[2017-06-22 15:01:02,628] WARN Connected to an old server; r-o mode will be unavailable (org.apache.zookeeper.ClientCnxnSocket)
Topic:test      PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: test     Partition: 0    Leader: 1       Replicas: 1     Isr: 1

Leader:指定主分区的broker id
Replicas: 副本在那些机器上
Isr:可以做为主分区的broker id

      由上面可以知道test的分区在broker id为1的机器上,进入kafka_2.10-0.10.2.1/kafka-logs,这个目录是在server.properties中配置的log.dirs指定的目录

      当前目录下有一个test-0的目录,日志文件夹的命名规则是 主题名-分区号,进入test-0,内容如下

[hadoop@m2 kafka-logs]$ cd test-0/
[hadoop@m2 test-0]$ ls
00000000000000000000.index  00000000000000000000.log  00000000000000000000.timeindex

     可以发现数据文件由.index文件、.log文件、.timeindex文件组成

     可以通过kafka安装目录bin目录下的kafka-run-class.sh查看这些文件的内容

查看log文件

[hadoop@m2 test-0]$ ../../bin/kafka-run-class.sh  kafka.tools.DumpLogSegments --files 00000000000000000000.index  --print-data-log  
Dumping 00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 CreateTime: 1498104812192 isvalid: true payloadsize: 11 magic: 1 compresscodec: NONE crc: 3271928089 payload: hello world
offset: 1 position: 45 CreateTime: 1498104813269 isvalid: true payloadsize: 14 magic: 1 compresscodec: NONE crc: 242183772 payload: hello everyone

查看index文件

[hadoop@m2 test-0]$ ../../bin/kafka-run-class.sh  kafka.tools.DumpLogSegments --files 00000000000000000000.index  --print-data-log  
Dumping 00000000000000000000.index
offset: 0 position: 0

查看timeindex文件

[hadoop@m2 test-0]$ ../../bin/kafka-run-class.sh  kafka.tools.DumpLogSegments --files 00000000000000000000.timeindex  --print-data-log  
Dumping 00000000000000000000.timeindex
timestamp: 1498104813269 offset: 1
Found timestamp mismatch in :/home/hadoop/apps/kafka_2.10-0.10.2.1/kafka-logs/test-0/00000000000000000000.timeindex
  Index timestamp: 0, log timestamp: 1498104812192
Found out of order timestamp in :/home/hadoop/apps/kafka_2.10-0.10.2.1/kafka-logs/test-0/00000000000000000000.timeindex
  Index timestamp: 0, Previously indexed timestamp: 1498104813269

      index件和log文件组成segment,segment文件的命名规则是,partion全局的第一个segment从0开始,后续每个segment文件名为上一个全局partion的最大offset(偏移message数)。数值最大为64位long大小,19位数字字符长度,没有数字用0填充。log.segment.bytes参数配置了一个log文件的大小,文件大小超过这个值就会生成新的文件

猜你喜欢

转载自blog.csdn.net/aa5305123/article/details/84350779