Introduction to hadoop--Storing data to HDFS through Apache Flume

This notebook is based on Hadoop2.7.3, Apache Flume 1.8.0. The flume source is netcat, the flume channel is memory, and the flume sink is hdfs.


1. Configure the flume agent file

Configure a flume agent, here named shaman. The configuration file (netcat-memory-hdfs.conf) is as follows:

# Identify the components on agent shaman:
shaman.sources = netcat_s1
shaman.sinks = hdfs_w1
shaman.channels = in-mem_c1
# Configure the source:
shaman.sources.netcat_s1.type = netcat
shaman.sources.netcat_s1.bind = localhost
shaman.sources.netcat_s1.port = 44444
# Describe the sink:
shaman.sinks.hdfs_w1.type = hdfs
shaman.sinks.hdfs_w1.hdfs.path = hdfs://localhost:8020/user/root/test
shaman.sinks.hdfs_w1.hdfs.writeFormat = Text
shaman.sinks.hdfs_w1.hdfs.fileType = DataStream

# Configure a channel that buffers events in memory:
shaman.channels.in-mem_c1.type = memory
shaman.channels.in-mem_c1.capacity = 20000
shaman.channels.in-mem_c1.transactionCapacity = 100
# Bind the source and sink to the channel:
shaman.sources.netcat_s1.channels = in-mem_c1
shaman.sinks.hdfs_w1.channel = in-mem_c1

Note:
hdfs://localhost:8020/user/root/test, which hdfs://localhost:8020is the value of the
fs.defaultFSattribute which rootis the login user of hadoop.

2. Start the flume agent

bin/flume-ng agent -f agent/netcat-memory-hdfs.conf -n shaman  -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true

3, open the telnet client, enter the letter test

telnet localhost 44444

then enter text

4. View the hdfs test directory

hdfs dfs -ls /user/root/test

You will find that a new file appears, and the content in the file is the letters entered through telent.


Learning materials:
1, "Hadoop For Dummies"
2, Flume 1.8.0 User Guide

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325811517&siteId=291194637
Recommended