flume详细配置及介绍请查看文章:
https://blog.csdn.net/zht245648124/article/details/90137807
1.flume数据下沉之hdfs目录
可以将channel中的数据最终保存到hdfs中,配置文件如下:
#####################################################################
## 监听文件中的新增数据
## 使用文件做为channel
## this agent is consists of source which is r1 , sinks which is k1,
## channel which is c1
##
## 这里面的a1 是flume一个实例agent的名字
#####################################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 监听数据源的方式,这里采用监听网络端口
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888
# 采集的数据的下沉(落地)方式 存储到hdfs的某一路径
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/envents/%Y/%m/%d
# 文件生成后的前缀
a1.sinks.k1.hdfs.filePrefix = http
# 文件生成后的后缀,如http.1521927418991.log
a1.sinks.k1.hdfs.fileSuffix = .log
# 文件使用时的前缀
a1.sinks.k1.hdfs.inUsePrefix = xttzm.
# 文件使用时的后缀,如xttzm.http.1521927418992.log.zdhm
a1.sinks.k1.hdfs.inUseSuffix = .zdhm
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 10
a1.sinks.k1.hdfs.rollCount = 5
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# 默认为SequenceFile,查看hdfs上的文件时为序列化的
a1.sinks.k1.hdfs.fileType = DataStream
# 上面的要配置,这个也要配置,写入的数据格式为文本内容
a1.sinks.k1.hdfs.writeFormat = Text
# 下面这个配置选项不加,那么rollInterval rollSize rollCount是不会生效的
a1.sinks.k1.hdfs.minBlockReplicas = 1
# 描述channel的部分,使用内存临时存储
a1.channels.c1.type=memory
# 使用channel将source和sink连接起来
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume agent:
./flume-ng agent -c conf -n a1 -f ../conf/flume-sink_hdfs.conf -Dflume.root.logger=INFO,console
通过nc发送数据:
$ nc localhost 8888
1
OK
2
OK
3
OK
这样,在hdfs目录下会生成三个正式文件,同时还应该有一个临时文件:
$ hdfs dfs -ls /input/flume/2018/03/25/
Found 4 items
-rw-r--r-- 3 uplooking supergroup 10 2018-03-25 06:00 /input/flume/2018/03/25/http.1521928799720.log
-rw-r--r-- 3 uplooking supergroup 11 2018-03-25 06:00 /input/flume/2018/03/25/http.1521928799721.log
-rw-r--r-- 3 uplooking supergroup 15 2018-03-25 06:00 /input/flume/2018/03/25/http.1521928799722.log
-rw-r--r-- 3 uplooking supergroup 3 2018-03-25 06:00 /input/flume/2018/03/25/xttzm.http.152192879
2.flume数据下沉之hbase
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888
a1.sinks.k1.type = hbase
a1.sinks.k1.table = ns1:t12 #hbase的表名
a1.sinks.k1.columnFamily = f1 #hbase的列簇
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.channels.c1.type=memory
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume agent:
./flume-ng agent -c conf -n a1 -f ../conf/flume-sink_hbase.conf -Dflume.root.logger=INFO,console
3.使用avroSource和AvroSink实现跃点agent处理
1.创建配置文件[avro_hop.conf]
#a1
a1.sources = r1
a1.sinks= k1
a1.channels = c1
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=8888
a1.sinks.k1.type = avro
a1.sinks.k1.hostname=localhost
a1.sinks.k1.port=9999
a1.channels.c1.type=memory
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
#a2【avro-source.conf 】
a2.sources = r2
a2.sinks= k2
a2.channels = c2
a2.sources.r2.type=avro
a2.sources.r2.bind=localhost
a2.sources.r2.port=9999
a2.sinks.k2.type = logger
a2.channels.c2.type=memory
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2
2.启动a2
$>flu./flume-ng agent -c conf -n a2 -f ../conf/avro-source.conf -Dflume.root.logger=INFO,console
3.验证a2
$>netstat -anop | grep 9999
4.启动a1
$>./flume-ng agent -c conf -n a1 -f ../conf/avro-hop.conf -Dflume.root.logger=INFO,console
5.验证a1
$>netstat -anop | grep 8888
6.启动nc,发送数据
nc localhost 8888
>hello
>ni hao
4.flume数据下沉之kafka
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=8888
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = flume_source
a1.sinks.k1.kafka.bootstrap.servers = 192.168.100.11:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.channels.c1.type=memory
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume
./flume-ng agent -c conf -n a1 -f ../conf/flume-sink_kafka.conf -Dflume.root.logger=INFO,console
kafka启动消费者,查看消费的数据
bin/kafka-console-consumer.sh --bootstrap-server 192.168.100.11:9092 --topic flume_source --from-beginning
5.flume数据源之kafka
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = 192.168.100.11:9092
a1.sources.r1.kafka.topics = flume_source
a1.sources.r1.kafka.consumer.group.id = g4
a1.sinks.k1.type = logger
a1.channels.c1.type=memory
a1.channels.c1.capacity = 5000
a1.channels.c1.transactionCapacity = 5000
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动flume
./flume-ng agent -c conf -n a1 -f ../conf/flume-source_kafka.conf -Dflume.root.logger=INFO,console
启动kafka生产者,查看数据
bin/kafka-console-producer.sh --broker-list 192.168.100.12:9092 --topic flume_kafka