About the connection between flume and kafka

Before connecting flume and kafka, we need to know why they need to connect these two systems:
1. In the production environment, logs are often read for analysis, and this is often multi-data source, if you use it alone It is undoubtedly very inconvenient if Kafka builds multiple producers to use file streams to write data to the topic and then for consumers to consume (the advantage of their docking here is to collect log files for multiple systems to use).
2. Flume can use interceptors to process data in real time . These are very useful for data shielding or excess. Kafka needs an external stream processing system to do it.
3. If Flume is directly connected to the real-time computing framework, when the data collection speed is greater than the data processing speed, data accumulation or data loss is prone to occur, and kafka can be used as a message buffer queue, which can store data for a period of time for peak reduction .

Steps to connect flume and kafka:
1. Create the file kafka.conf in the /opt/module/flume/job directory of the hadoop102 machine.
Here, we use netcat for Source, Memory Channel for Channel, and Kafka for Sink. Sink. Then fill in the following content.
Pay attention to the a1.sinks.k1.kafka.topic = xxx in Sink in the configuration, and write your own topic in Kafka here. a1.sinks.k1.kafka.bootstrap.servers =xxx Here write the host name of your cluster, and 9092 is the port number of Kafka.

#Name
a1.sources = r1
a1.channels = c1
a1.sinks = k1

#Source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444


#Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


#Sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = demo2
a1.sinks.k1.kafka.bootstrap.servers = hadoop102:9092,hadoop103:9092,hadoop104:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1


#Bind
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2. After the above content is saved, let's start kafka and zookeeper in the cluster.
Note that you must start zk first and
execute bin/zhServer.sh in the installation directory of kafka zk start
kafka installation directory execute bin/kafka-server-start.sh- daemon config/server.properties

3. Start a consumer on hadoop102 and
execute bin/kafka-consumer-consumer.sh --zookeeper hadoop102:2181 --topic demo2it in the installation directory of kafka. Here I use the demo2 theme

Then open two hadoop102 windows in Xshell, one for opening flume, execute [root@hadoop102 flume]# bin/flume-ng agent -c conf/ -f job/kafka.conf -n in the installation directory of flume a1

Then enter the content in another window [root@hadoop102 ~]# nc localhost 44444

Then we send any content to it, and we can see that our messages after flume can still be received in Kafka.

Insert picture description here

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44080445/article/details/107425277