Flume 传递数据到HDFS上

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/yulei_qq/article/details/82453592

使用瑞士军刀(netcat 作为输入源) ,hdfs 作为flume 的输出源(sink)

flume 配置文件内容如下:

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://mycluster/flume/events/%y-%m-%d/%H/%M/%S
a1.sinks.k1.hdfs.filePrefix = events-

#是否是产生新目录,每十分钟产生一个新目录,一般控制的目录方面。
#2017-12-12 -->
#2017-12-12 -->%H%M%S

a1.sinks.k1.hdfs.round = true			
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = second

a1.sinks.k1.hdfs.useLocalTimeStamp=true

#是否产生新文件。
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.hdfs.rollSize=10
a1.sinks.k1.hdfs.rollCount=3

a1.channels.c1.type=memory

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 其中hdfs://mycluster 是HDFS的集群地址.

使用flume-ng agent  -f hdfs_k.conf  -n a1  即可启动flume 

[hadoop@s202 /soft/flume/conf]$flume-ng agent  -f hdfs_k.conf  -n a1

启动netcat 客户端

[hadoop@s202 ~]$nc localhost  8888

[hadoop@s202 ~]$nc localhost  8888
he ll sss adsd
OK
hellow ssdfvas fffg
OK

然后在flume 控制可以看到收集的日志目录

18/09/06 08:52:12 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
18/09/06 08:52:12 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:8888]
18/09/06 08:52:42 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
18/09/06 08:52:42 INFO hdfs.BucketWriter: Creating hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064.tmp
18/09/06 08:52:51 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
18/09/06 08:52:51 INFO hdfs.BucketWriter: Creating hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721.tmp
18/09/06 08:52:56 INFO hdfs.BucketWriter: Closing hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064.tmp
18/09/06 08:52:56 INFO hdfs.BucketWriter: Renaming hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064.tmp to hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064
18/09/06 08:52:56 INFO hdfs.HDFSEventSink: Writer callback called.
18/09/06 08:53:01 INFO hdfs.BucketWriter: Closing hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721.tmp
18/09/06 08:53:02 INFO hdfs.BucketWriter: Renaming hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721.tmp to hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721
18/09/06 08:53:02 INFO hdfs.HDFSEventSink: Writer callback called.

使用如下命令进行查看:

[hadoop@s202 ~]$hdfs dfs  -cat /flume/events/18-09-06/08/52/50/events-.1536195171721
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable?.~?;...s.e..hellow ssdfvas fffg?.~?;...s.[hadoop@s202 ~]$

可以看到这是一个HDFS 的序列文件 ,可以模糊的看到"hellow ssdfvas fffg "字样.

也可以使用如下命令查看:

[hadoop@s202 ~]$hdfs dfs  -text  /flume/events/18-09-06/08/52/50/events-.1536195171721
1536195171972	68 65 6c 6c 6f 77 20 73 73 64 66 76 61 73 20 66 66 66 67

猜你喜欢

转载自blog.csdn.net/yulei_qq/article/details/82453592
今日推荐