kafka整合flume

对于初学者来说，可能对kafka和flume有一定的了解，但是并没有过实际的应用，也不知道如何来使用。
这篇文章主要针对kafka和flume的整合

环境：linux

准备工作：搭建好zookeeper集群及kafka集群

版本：kafka_2.11-1.1.0，flume-1.8.0

步骤：

启动zookeeper集群，在每一个节点执行zkServer.sh start
检查kafka是否安装成功
启动kafka集群
进入kafka安装的根目录，在每一个节点均执行命令
bin/kafka-server-start.sh config/server.properties &
使用 jps 命令查看是否有kafka线程

此时原有的窗口会被占用，可再开启一个窗口

3.查看所有的topic列表

bin/kafka-topics.sh --list --zookeeper huaxia01:2181,huaxia02:2181,huaxia03:2181

在kafka根目录下执行以上命令
在这里插入图片描述
4.创建新的topic

bin/kafka-topics.sh --create --topic news-logs-1807 --zookeeper huaxia01:2181,huaxia02:2181,huaxia03:2181 --partitions 3 --replication-factor 3

5.创建成功后查看所创建的topic

bin/kafka-topics.sh --describe --topic news-logs-1807 --zookeeper huaxia01:2181,huaxia02:2181,huaxia03:2181

在这里插入图片描述
此处可能版本不一样导致命令报错，可根据提示修改命令

6.创建生产者和消费者
这里可以根据https://blog.csdn.net/u011116672/article/details/76400861
查看生产者和消费者的关系

7.进入到flume的conf目录，新建flume-kafka-sink.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

#对于source的配置描述 监听文件中的新增数据 exec
a1.sources.r1.type = exec
a1.sources.r1.command  = tail -F /home/bigdata/data/projects/news/data/news_log_rt.log


#对于sink的配置描述 使用kafka日志做数据的消费
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = huaxia01:9092,huaxia01:9092,huaxia01:9092

#这里设置所创建的topic
a1.sinks.k1.kafka.topic = news-logs-1807
a1.sinks.k1.kafka.flumeBatchSize = 1000
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1

#对于channel的配置描述 使用文件做数据的临时缓存 这种的安全性要高
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 1000

#通过channel c1将source r1和sink k1关联起来
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

8.验证kafka和flume
创建文件 /home/huaxia/data/projects/news/data/news_log_rt.log用于存放数据

9.创建news_log_rt.sh脚本，注意要与news_log_rt.log在同一个目录下
任意输入一个脚本，我在这里举个例子

arr=("hubei_wuhan" "hebei_shijiazhuang" "guangdong_guangzhou" "jiangsu_nanjing" "hunan_changsha")
function rand(){
    min=$1
    max=$(($2-$min+1))
    num=$(date +%s%N)
    echo $(($num%$max+$min))
}
for((i=0;i<50;i++));
        do
                rnd1=$(rand 0 4)
                currentTime=`date "+%Y-%m-%d %H:%M:%S"`
                timeStamp=`date -d "$currentTime" +%s`
                a=${arr[$rnd1]}
                province=`echo $a | cut -d \_ -f 1`
                city=`echo $a | cut -d \_ -f 2`
                rnd2=$(rand 0 10)
                userid=$rnd2
                rnd3="10000"$(rand 0 3)
                advid=$rnd3
            newStr=${timeStamp}","${province}","${city}","${userid}","${advid}
echo $newStr >> news_log_rt.log
        done

10.创建kafka消费者监控news-logs-1807这个topic
在kafka的安装目录下执行

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic news-logs-1807 --new-consumer --from-beginning --consumer.config config/consumer.properties

在这里插入图片描述
11.启动flume
在flume的安装目录下执行

bin/flume-ng agent -n a1 -c conf -f conf/flume-kafka-sink.conf -Dflume.root.logger=INFO,console

启动成功：
在这里插入图片描述
12.启动news_log_rt.sh文件，在脚本所在目录下执行

sh news_log_rt.sh

13.观察/home/huaxia/data/projects/news/data/news_log_rt.log的输出结果
同时在消费者控制台上会观察到以下现象
在这里插入图片描述