Use Flume to kafka and write data simultaneously in hdfs

Environmental background

 

Component Name Component Version Baidu network disk address
Flume   flume-of-1.6.0-cdh5.7.0.tar.gz Link: https: //pan.baidu.com/s/11QeF7rk2rqnOrFankr4TzA extraction code: 3ojw
Zookeeper  Zookeeper-3.4.5 Link: https: //pan.baidu.com/s/1upNcB53WGWP_89lhYnqP6g extraction code: j50f
Kafka kafka_2.11-0.10.0.0.tgz  Link: https: //pan.baidu.com/s/1TpU6QPnoF1tuUy-7HnGgmQ extraction code: aapj

 

  

Zookeeper departments    see Part 4

flume deployment

  • kafka deployment

# Decompression 
[hadoop @ hadoop001 soft] $ CD ~ / Soft 
[hadoop @ hadoop001 soft] $ the tar -zxvf kafka_2. . 11 - 0.10 . 0.0 .tgz -C ~ / App / 

# modify data storage location 
[hadoop @ hadoop001 soft] $ CD ~ / App / kafka_2. . 11 - 0.10 . 0.0 / 
[@ hadoop001 kafka_2 Hadoop. . 11 - 0.10 . 0.0 ] $ mkdir -p ~ / App / kafka_2. . 11 - 0.10 . 0.0 / datalogdir 
. [Hadoop @ hadoop001 kafka_2 . 11 - 0.10 . 0.0 ] $ vim config /server.properties 
log.dirs=/home/hadoop/app/kafka_2.11-0.10.0.0/datalogdir

#添加环境变量
[hadoop@hadoop001 kafka_2.11-0.10.0.0]$ vim ~/.bash_profile 
export KAFKA_HOME=/home/hadoop/app/kafka_2.11-0.10.0.0
export PATH=$KAFKA_HOME/bin:$PATH
[hadoop@hadoop001 kafka_2.11-0.10.0.0]$ source ~/.bash_profile 
[hadoop@hadoop001 kafka_2.11-0.10 . 0.0 ] $ Which kafka- topics.sh
 ~ / App / kafka- 0.10 . 1.1 / bin / kafka- topics.sh 

# promoter 
[Hadoop @ hadoop001 kafka_2. . 11 - 0.10 . 0.0 ] $ bin / Server-Start-Kafka config .sh / the server.properties 

# test: create Topic 
[hadoop @ hadoop001 kafka_2. 11 - 0.10 . 0.0 ] $ bin / kafka-topics.sh --create --zookeeper localhost: 2181 --replication factor- 1 --partitions 1 - Topic wsk_test 
# test: show Topic list 
[hadoop @ hadoop001 kafka_2.11 - 0.10 . 0.0 ] $ bin / kafka-topics.sh --list --zookeeper localhost: 2181 
# test: Console producer 
. [Hadoop @ hadoop001 kafka_2 11 - 0.10 . 0.0 ] $ bin / Kafka used to live-Console-Producer --broker-List localhost .sh: 9092 - Topic wsk_test 
# test: console consumer 
[. hadoop @ hadoop001 kafka_2 11 - 0.10 . 0.0 ] $ bin / kafka-console-consumer.sh --zookeeper localhost: 2181 - wsk_test -topic - from -beginning
  • Configuring Flume jobs

 

 

Flume transmitted using the data collected to Kafka TailDir Source and HDFS. Specific configuration is as follows:

Taildir-HdfsAndKafka-Agnet.sources = taildir-source   
Taildir-HdfsAndKafka-Agnet.channels = c1 c2
Taildir-HdfsAndKafka-Agnet.sinks = hdfs-sink kafka-sink

Taildir-HdfsAndKafka-Agnet.sources.taildir-source.type = TAILDIR
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups = f1
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups.f1 = /home/hadoop/data/flume/HdfsAndKafka/input/.*
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.positionFile = /home/hadoop/data/flume/HdfsAndKafka/taildir_position/taildir_position.json
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.selector.type = replicating

Taildir-HdfsAndKafka-Agnet.channels.c1.type = memory
Taildir-HdfsAndKafka-Agnet.channels.c2.type = memory

Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.type = hdfs
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/HdfsAndKafka/%Y%m%d%H%M
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.useLocalTimeStamp=true
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.filePrefix = wsktest-
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollInterval = 10
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollSize = 100000000
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollCount = 0
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.fileType=DataStream
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.writeFormat=Text

Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.brokerList = localhost:9092
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.topic = wsk_test


Taildir-HdfsAndKafka-Agnet.sources.taildir-source.channels = c1 c2
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.channel = c1
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.channel = c2

 

  • 启动命令

flume-ng agent \
--name Taildir-HdfsAndKafka-Agnet \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/Taildir-HdfsAndKafka-Agnet.conf \
-Dflume.root.logger=INFO,console

 

Guess you like

Origin www.cnblogs.com/xuziyu/p/11115421.html