This notebook is based on Hadoop2.7.3, Apache Flume 1.8.0. The flume source is netcat, the flume channel is memory, and the flume sink is hdfs.
1. Configure the flume agent file
Configure a flume agent, here named shaman. The configuration file (netcat-memory-hdfs.conf) is as follows:
# Identify the components on agent shaman:
shaman.sources = netcat_s1
shaman.sinks = hdfs_w1
shaman.channels = in-mem_c1
# Configure the source:
shaman.sources.netcat_s1.type = netcat
shaman.sources.netcat_s1.bind = localhost
shaman.sources.netcat_s1.port = 44444
# Describe the sink:
shaman.sinks.hdfs_w1.type = hdfs
shaman.sinks.hdfs_w1.hdfs.path = hdfs://localhost:8020/user/root/test
shaman.sinks.hdfs_w1.hdfs.writeFormat = Text
shaman.sinks.hdfs_w1.hdfs.fileType = DataStream
# Configure a channel that buffers events in memory:
shaman.channels.in-mem_c1.type = memory
shaman.channels.in-mem_c1.capacity = 20000
shaman.channels.in-mem_c1.transactionCapacity = 100
# Bind the source and sink to the channel:
shaman.sources.netcat_s1.channels = in-mem_c1
shaman.sinks.hdfs_w1.channel = in-mem_c1
Note:
hdfs://localhost:8020/user/root/test, which hdfs://localhost:8020
is the value of the
fs.defaultFS
attribute which root
is the login user of hadoop.
2. Start the flume agent
bin/flume-ng agent -f agent/netcat-memory-hdfs.conf -n shaman -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
3, open the telnet client, enter the letter test
telnet localhost 44444
then enter text
4. View the hdfs test directory
hdfs dfs -ls /user/root/test
You will find that a new file appears, and the content in the file is the letters entered through telent.
Learning materials:
1, "Hadoop For Dummies"
2, Flume 1.8.0 User Guide