flume下沉hdfs、hbase、avro、kafka示例

flume详细配置及介绍请查看文章:
https://blog.csdn.net/zht245648124/article/details/90137807

1.flume数据下沉之hdfs目录
可以将channel中的数据最终保存到hdfs中,配置文件如下:

#####################################################################
## 监听文件中的新增数据
## 使用文件做为channel
## this agent is consists of source which is r1 , sinks which is k1,
## channel which is c1
## 
## 这里面的a1 是flume一个实例agent的名字
#####################################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 监听数据源的方式,这里采用监听网络端口
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 采集的数据的下沉(落地)方式 存储到hdfs的某一路径
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/envents/%Y/%m/%d
# 文件生成后的前缀
a1.sinks.k1.hdfs.filePrefix = http
# 文件生成后的后缀,如http.1521927418991.log
a1.sinks.k1.hdfs.fileSuffix = .log
# 文件使用时的前缀
a1.sinks.k1.hdfs.inUsePrefix = xttzm.
# 文件使用时的后缀,如xttzm.http.1521927418992.log.zdhm
a1.sinks.k1.hdfs.inUseSuffix = .zdhm
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 10
a1.sinks.k1.hdfs.rollCount = 5
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# 默认为SequenceFile,查看hdfs上的文件时为序列化的
a1.sinks.k1.hdfs.fileType = DataStream
# 上面的要配置,这个也要配置,写入的数据格式为文本内容
a1.sinks.k1.hdfs.writeFormat = Text
# 下面这个配置选项不加,那么rollInterval rollSize rollCount是不会生效的
a1.sinks.k1.hdfs.minBlockReplicas = 1

# 描述channel的部分,使用内存临时存储
a1.channels.c1.type=memory

# 使用channel将source和sink连接起来
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume agent:

./flume-ng agent -c conf -n a1 -f ../conf/flume-sink_hdfs.conf -Dflume.root.logger=INFO,console

通过nc发送数据:

$ nc localhost 8888
1
OK
2
OK
3
OK

这样,在hdfs目录下会生成三个正式文件,同时还应该有一个临时文件:

$ hdfs dfs -ls /input/flume/2018/03/25/ 
Found 4 items
-rw-r--r--   3 uplooking supergroup         10 2018-03-25 06:00 /input/flume/2018/03/25/http.1521928799720.log
-rw-r--r--   3 uplooking supergroup         11 2018-03-25 06:00 /input/flume/2018/03/25/http.1521928799721.log
-rw-r--r--   3 uplooking supergroup         15 2018-03-25 06:00 /input/flume/2018/03/25/http.1521928799722.log
-rw-r--r--   3 uplooking supergroup          3 2018-03-25 06:00 /input/flume/2018/03/25/xttzm.http.152192879

2.flume数据下沉之hbase

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

a1.sinks.k1.type = hbase
a1.sinks.k1.table = ns1:t12         #hbase的表名
a1.sinks.k1.columnFamily = f1   #hbase的列簇
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer

a1.channels.c1.type=memory

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume agent:

./flume-ng agent -c conf -n a1 -f ../conf/flume-sink_hbase.conf -Dflume.root.logger=INFO,console

3.使用avroSource和AvroSink实现跃点agent处理
1.创建配置文件[avro_hop.conf]

		#a1
		a1.sources = r1
		a1.sinks= k1
		a1.channels = c1

		a1.sources.r1.type=netcat
		a1.sources.r1.bind=localhost
		a1.sources.r1.port=8888

		a1.sinks.k1.type = avro
		a1.sinks.k1.hostname=localhost
		a1.sinks.k1.port=9999

		a1.channels.c1.type=memory

		a1.sources.r1.channels = c1
		a1.sinks.k1.channel = c1
		
         #a2【avro-source.conf 】
		a2.sources = r2
		a2.sinks= k2
		a2.channels = c2

		a2.sources.r2.type=avro
		a2.sources.r2.bind=localhost
		a2.sources.r2.port=9999

		a2.sinks.k2.type = logger

		a2.channels.c2.type=memory

		a2.sources.r2.channels = c2
		a2.sinks.k2.channel = c2
		2.启动a2
		$>flu./flume-ng agent -c conf -n a2 -f ../conf/avro-source.conf -Dflume.root.logger=INFO,console
		
		3.验证a2
			$>netstat -anop | grep 9999
			
		4.启动a1
			$>./flume-ng agent -c conf -n a1 -f ../conf/avro-hop.conf -Dflume.root.logger=INFO,console
			
		5.验证a1
			$>netstat -anop | grep 8888
		6.启动nc,发送数据
		nc localhost 8888
		>hello 
		>ni hao

4.flume数据下沉之kafka

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=8888

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = flume_source
a1.sinks.k1.kafka.bootstrap.servers = 192.168.100.11:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1

a1.channels.c1.type=memory

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume

./flume-ng agent -c conf -n a1 -f ../conf/flume-sink_kafka.conf -Dflume.root.logger=INFO,console

kafka启动消费者,查看消费的数据

bin/kafka-console-consumer.sh --bootstrap-server 192.168.100.11:9092 --topic flume_source --from-beginning

5.flume数据源之kafka

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = 192.168.100.11:9092
a1.sources.r1.kafka.topics = flume_source
a1.sources.r1.kafka.consumer.group.id = g4

a1.sinks.k1.type = logger

a1.channels.c1.type=memory
a1.channels.c1.capacity = 5000
a1.channels.c1.transactionCapacity = 5000

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动flume

./flume-ng agent -c conf -n a1 -f ../conf/flume-source_kafka.conf -Dflume.root.logger=INFO,console

启动kafka生产者,查看数据

bin/kafka-console-producer.sh --broker-list 192.168.100.12:9092 --topic flume_kafka

猜你喜欢

转载自blog.csdn.net/zht245648124/article/details/90140470