Flume 环境安装部署

Flume NG部署

1、下载flume安装包。

2、切换到hadoop用户操作,进入flume/conf目录。

[root@master java]$ su hadoop
[hadoop@master java]$ cd flume/conf
[hadoop@master conf]$ ls
flume-conf.properties.template  flume-env.ps1.template  flume-env.sh.template  log4j.properties

需要通过flume-conf.properties.template复制一个flume-conf.properties配置文件。

[hadoop@master conf]$ cp flume-conf.properties.template flume-conf.properties
[hadoop@master conf]$ ls
flume-conf.properties	flume-conf.properties.template  flume-env.ps1.template  flume-env.sh.template  log4j.properties

修改master节点上的flume-conf.properties配置文件。

[hadoop@master conf]$ vi flume-conf.properties
# Define source, channel, sink(定义三个插件名称)
agent1.sources = spool-source1
agent1.channels = ch1
agent1.sinks = hdfs-sink1

# Define and configure an Spool directory source(使用spooldir监控日志目录)
agent1.sources.spool-source1.channels = ch1
agent1.sources.spool-source1.type = spooldir
agent1.sources.spool-source1.spoolDir = /home/hadoop/data
agent1.sources.spool-source1.ignorePattern = event(_\d{4}\-\d{2}\-\d{2}_\d{2}_\d{2})?\.log(\.COMPLETED)?
agent1.sources.spool-source1.deserializer.maxLineLength = 10240

# Configure channel(channel 选择file,防止数据丢失)
agent1.channels.ch1.type = file
agent1.channels.ch1.checkpointDir = /home/hadoop/flume/checkpointDir
agent1.channels.ch1.dataDirs = /home/hadoop/flume/dataDirs

# Define and configure a hdfs sink(数据采集到hdfs)
agent1.sinks.hdfs-sink1.channel = ch1
agent1.sinks.hdfs-sink1.type = hdfs
agent1.sinks.hdfs-sink1.hdfs.path = hdfs://master:9000/flume/%Y%m%d
agent1.sinks.hdfs-sink1.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-sink1.hdfs.rollInterval = 300
agent1.sinks.hdfs-sink1.hdfs.rollSize = 67108864
agent1.sinks.hdfs-sink1.hdfs.rollCount = 0
#agent1.sinks.hdfs-sink1.hdfs.codeC = snappy

如果Hadoop已经安装Snappy压缩,那么flume采集数据的时候可以启动压缩。

agent1.sinks.hdfs-sink1.hdfs.codeC = snappy

如果没有安装Snappy,可以把上面的命令注释掉

3、首先要确保 Hadoop 集群正常运行。

4、在master节点上启动flume。

[hadoop@master flume]$bin/flume-ng agent -n agent1  -f conf/flume-conf.properties

需要注意的是:-n 指定agent的名称;-f 后跟具体的配置文件。

5、随便在本地找一个文件(比如test.txt)上传至监控目录/home/hadoop/data,此时可以看到控制台打印有flume采集数据的日志信息。

6、在HDFS web界面查看flume采集过来的数据,如果看到我们采集的数据,说明flume采集数据成功。

http://master:50070/dfshealth.html#tab-overview

至此一个flume的应用场景已经分析完毕

猜你喜欢

转载自blog.csdn.net/pigziprogrammer/article/details/94489555