一、需求

使用flume收集多台服务器上的日志（文件夹，或者文件），推送给kafka服务器实时消费展示，并且存储到hdfs。

二、准备工作

2.1、虚拟机：

最少4台虚拟机（hadoop需要至少3台虚拟机搭建集群，一台虚拟机作为日志源flume收集目标）

hadoop1 192.168.48.144

hadoop2 192.168.48.145

hadoop3 192.168.48.146

flume1 192.168.48.147

2.2、所需文件

jdk-8u101-linux-x64.gz

apache-flume-1.7.0-bin.tar.gz

扫描二维码关注公众号，回复： 5681945 查看本文章

zookeeper-3.4.8.tar.gz

kafka_2.11-0.10.2.0.tgz

hadoop-2.7.3.tar.gz

2.3、虚拟机创建

一台日志源虚拟机flume1

三台hadoop机hadoop1，hadoop2，hadoop3（可先创建一台hadoop1作为namenode，配置完成后使用VM的克隆，复制出剩余的两台datanode）

三、hadoop搭建

为了快速搭建，我这边没有创建用户用户组，直接使用根用户。

3.1、VM创建一台虚拟机

编辑

vi /etc/sysconfig/network-scripts/ifcfg-eth0

3.2、编辑hosts文件

vi /etc/hosts

3.3、编辑network文件

vi /etc/sysconfig/network

修改hostname为hadoop1

重启生效

3.4、关闭防火墙和selinux

永久关闭防火墙：chkconfig --level 35 iptables off

永久关闭selinux：

vim /etc/selinux/config

找到SELINUX 行修改成为：SELINUX=disabled

3.5、安装jdk

jdk包放置在/usr/local/xialei/java

cd 到当前目录，解压

[root@hadoop1 java]# tar -zxvf jdk-8u101-linux-x64.gz

配置环境变量

[root@hadoop1 java]# vi /etc/profile

文件最下面添加以下几行

export JAVA_HOME=/usr/local/xialei/java/jdk1.8.0_101

export PATH=$PATH:$JAVA_HOME/bin

export CLASSPATH=.$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

使其生效

source /etc/profile

检测是否配置成功

[root@hadoop1 java]# java -version

3.6、克隆虚拟机

VM中将hadoop1克隆两台虚拟机，分别命名为hadoop2，hadoop3。分别修改两台机器上的etc/hosts文件，修改相应的hostname。

3.7、配置三台机器的ssh免密登录

这里不多做阐述，贴出网上方法

验证能否免密登录

可能第一次需要输入密码

3.8、修改hadoop配置文件，配置hadoop环境变量

新建hdfs文件夹，解压hadoop-2.7.3.tar.gz文件

[root@hadoop1 hdfs]# tar -zxvf hadoop-2.7.3.tar.gz

vi /etc/profile

在最后追加两行

export HADOOP_HOME=/usr/local/xialei/hdfs/hadoop-2.7.3/bin
export PATH=$PATH:$HADOOP_HOME

监测hadoop环境变量

修改配置文件：进入etc的hadoop文件夹下

[root@hadoop1 hdfs]# cd hadoop-2.7.3/etc/hadoop

3.8.1、修改core-site.xml

[root@hadoop1 hadoop]# vi core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
      <name>fs.default.name</name>
      <value>hdfs://hadoop1:9000</value>
  </property>
  <property>
      <name>hadoop.tmp.dir</name>
      <value>/usr/local/xialei/hadoop/tmp</value>
  </property>
</configuration>

3.8.2 修改hdfs-site.xml

[root@hadoop1 hadoop]# vi hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
    <name>dfs.replication</name>
        <value>2</value>
 </property>
 <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/usr/local/xialei/hadoop/dfs/name</value>
 </property>
 <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/usr/local/xialei/hadoop/dfs/data</value>
 </property>
</configuration>

3.8.3、先复制出来一份mapred-site.xml文件

[root@hadoop1 hadoop]# cp mapred-site.xml.template mapred-site.xml

编辑mapred-site.xml文件

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
 <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
  </property>
  <property>
         <name>mapreduce.jobhistory.address</name>
         <value>hadoop1:10020</value>
   </property>
   <property>
          <name>mapreduce.jobhistory.webapp.address</name>
          <value>hadoop1:19888</value>
     </property>
</configuration>

3.8.4、编辑yarn-site.xml文件

[root@hadoop1 hadoop]# vi yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>hadoop1:8032</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>hadoop1:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>hadoop1:8031</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address</name>
		<value>hadoop1:8033</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>hadoop1:8088</value>
	</property>
</configuration>

3.8.5、修改hadoop-env.sh文件

[root@hadoop1 hadoop]# vi hadoop-env.sh

找到其中的JAVA_HOME加上你的jdk路径

export JAVA_HOME=/usr/local/xialei/java/jdk1.8.0_101

3.8.6、修改slaves文件

[root@hadoop1 hadoop]# vi slaves

#localhost
hadoop2
hadoop3

至此，hadoop配置文件结束

3.9、验证配置是否正确

cd到hadoop的bin目录下

[root@hadoop1 hdfs]# cd hadoop-2.7.3/bin/

执行（注意只需执行一次，如果成功的话）

[root@hadoop1 bin]# ./hdfs namenode -format

查看打印日志，如果status为0，即配置成功

3.10、需要将hadoop文件分发到hadoop2，hadoop3两个节点

[root@hadoop1 /]# scp -r /usr/local/xialei/hdfs/ hadoop2:/usr/local/xialei/hdfs/

[root@hadoop1 /]# scp -r /usr/local/xialei/hdfs/ hadoop3:/usr/local/xialei/hdfs/

可能需要点时间，或者压缩之后传输。

传输完成后记得在hadoop2和hadoop3上修改slaves文件，以及配置hadoop环境变量。

3.11、启动hadoop

文件传输成功后，就是需要启动hadoop了

只需要在master节点（即hadoop1）中

[root@hadoop1 /]# cd /usr/local/xialei/hdfs/hadoop-2.7.3/sbin/

在sbin中执行命令

[root@hadoop1 sbin]# ./start-all.sh

可以使用jps命令查看启动结果

hadoop1

hadoop2

hadoop3

从三个虚拟机的jps命令结果发现已经启动成功

3.12、测试

48.144是我hadoop1的ip地址

http://192.168.48.144:8088

http://192.168.48.144:50070/explorer.html#/

可以查看到你的hadoop中所存储的文件。

3.13、向hdfs中上传一个文件测试

[root@hadoop1 /]# cd /usr/local/xialei/test_hdfs/

新建一个文件

[root@hadoop1 test_hdfs]# vi test_xl.txt

上传hdfs（上传到hdfs中的flume1文件夹中）

[root@hadoop1 test_hdfs]# hdfs dfs -put test_xl.txt /flume1/

再次浏览器查看

可点击下载

上传成功，并且可以下载。

点击下载能能出现这种情况

需要将hadoop2改为相应的ip（192.168.48.145）即可

可学习下hdfs命令

hadoop file system命令官方文档

3.14、运行wordCount

我们使用刚刚上传的文件测试。

首先进入hadoop 中的 share/hadoop/mapreduce文件夹下。里面有我们测试所需要的jar包。

执行命令

[root@hadoop1 mapreduce]# hadoop jar  hadoop-mapreduce-examples-2.7.3.jar   wordcount   /flume1/test_xl   /flume1/out

分析一下这条命令的参数。从jar包后面有三个参数，第一个参数是程序名，我们如果不加这个参数看看返回什么

[root@hadoop1 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.3.jar  /flume1/test_xl   /flume1/out
Unknown program '/flume1/test_xl' chosen.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

Valid program names are（有效的程序名是：）提示我们进行选择所需要执行的程序。

后面两个文件（夹）参数，第一个是分析所需要的源文件，第二个是分析的结果输出的文件夹。

运行结果

[root@hadoop1 mapreduce]# hadoop jar  hadoop-mapreduce-examples-2.7.3.jar   wordcount   /flume1/test_xl   /flume1/out
19/03/26 04:29:16 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.48.153:8032
19/03/26 04:29:19 INFO input.FileInputFormat: Total input paths to process : 1
19/03/26 04:29:19 INFO mapreduce.JobSubmitter: number of splits:1
19/03/26 04:29:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553586070696_0003
19/03/26 04:29:21 INFO impl.YarnClientImpl: Submitted application application_1553586070696_0003
19/03/26 04:29:21 INFO mapreduce.Job: The url to track the job: http://hadoop1:8088/proxy/application_1553586070696_0003/
19/03/26 04:29:21 INFO mapreduce.Job: Running job: job_1553586070696_0003
19/03/26 04:29:45 INFO mapreduce.Job: Job job_1553586070696_0003 running in uber mode : false
19/03/26 04:29:45 INFO mapreduce.Job:  map 0% reduce 0%
19/03/26 04:30:00 INFO mapreduce.Job:  map 100% reduce 0%
19/03/26 04:30:23 INFO mapreduce.Job:  map 100% reduce 100%
19/03/26 04:30:23 INFO mapreduce.Job: Job job_1553586070696_0003 completed successfully
19/03/26 04:30:23 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=60
		FILE: Number of bytes written=237501
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=117
		HDFS: Number of bytes written=30
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=12814
		Total time spent by all reduces in occupied slots (ms)=18202
		Total time spent by all map tasks (ms)=12814
		Total time spent by all reduce tasks (ms)=18202
		Total vcore-milliseconds taken by all map tasks=12814
		Total vcore-milliseconds taken by all reduce tasks=18202
		Total megabyte-milliseconds taken by all map tasks=13121536
		Total megabyte-milliseconds taken by all reduce tasks=18638848
	Map-Reduce Framework
		Map input records=6
		Map output records=6
		Map output bytes=42
		Map output materialized bytes=60
		Input split bytes=99
		Combine input records=6
		Combine output records=6
		Reduce input groups=6
		Reduce shuffle bytes=60
		Reduce input records=6
		Reduce output records=6
		Spilled Records=12
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=439
		CPU time spent (ms)=4400
		Physical memory (bytes) snapshot=299560960
		Virtual memory (bytes) snapshot=4129812480
		Total committed heap usage (bytes)=138854400
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=18
	File Output Format Counters 
		Bytes Written=30

我们看到hdfs中多了一个/flume1/out文件夹（输出文件夹hdfs会自动创建）

执行命令查看文件内容

[root@hadoop1 mapreduce]# hdfs dfs -cat /flume1/out/part-r-00000
aa	1
bb	1
cc	1
dd	1
ee	1
ff	1

四、kafka与zookeeper部署

hdfs搭建完成后，选择在根节点hadoop1服务器上搭建kafka和zookeeper。

4.1 配置zookeeper

4.1.1、安装

解压:

cd到conf

cp模版配置文件

编辑 vi zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/usr/local/xialei/kafka_zk/zk/zookeeper-3.4.8/zkdata
dataLogDir=/usr/local/xialei/kafka_zk/zk/zookeeper-3.4.8/zkdatalog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

dataDir和dataLogDir两个属性部分需要先创建文件夹，zookeeper的conf配置详情可以参考

zk安装及部署

4.1.2、配置zookeeper环境变量

编辑: vi /etc/profile

追加三行

export ZOOKEEPER_HOME=/usr/local/xialei/kafka_zk/zk/zookeeper-3.4.8
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export CLASSPATH=.$ZOOKEEPER_HOME/lib:

4.1.3、启动zookeeper

cd到zookeeper的bin中

4.1.4、查看启动结果

netstat -tunlp|grep 2181

监测2181端口是否启用

4.2 配置kafka

4.2.1、解压

4.2.2、

cd 到kafka的config文件夹

编辑server.properties

broker.id=0  
num.network.threads=2  
num.io.threads=8  
socket.send.buffer.bytes=1048576  
socket.receive.buffer.bytes=1048576 
socket.request.max.bytes=104857600
num.partitions=2
log.retention.hours=168  
log.segment.bytes=536870912  
log.retention.check.interval.ms=60000  
log.cleaner.enable=false  
zookeeper.connect=localhost:2181  
zookeeper.connection.timeout.ms=1000000  
log.dirs=/usr/local/xialei/kafka_zk/kaf/kafka_2.11-0.10.2.0/kafka_log
listeners=PLAINTEXT://192.168.48.144:9092

log.dirs属性部分需要先创建文件夹，kafka的配置详情可以参考

kafka配置文件参数说明

其中log.dirs属性是kafka数据的存放地址

4.2.3、启动kafka

cd到kafka的bin目录下

备注：启动成功后挂到后台使用，重开一个ssh操作

监听9092端口是否启用

4.2.4、kafka的topic

创建topic：kafka的bin目录下

sh kafka-topics.sh --create --topic xltopiic --replication-factor 1 --partitions 1 --zookeeper localhost:2181

查看所有topic

sh kafka-topics.sh --list --zookeeper 192.168.48.144:2181

查看指定的topic

 sh kafka-topics.sh --describe --zookeeper localhost:2181 --topic xltopiic

启动生产者（测试用，可不启动）启动成功后挂到后台，重启一个ssh

sh kafka-console-producer.sh --broker-list 192.168.48.144:9092 --sync --topic xltopic

启动消费者

sh kafka-console-consumer.sh --zookeeper localhost:2181 --topic xltopic --from-beginning

（--from beginning 是从头开始消费，不加则是消费当前正在发送到该topic的消息）

如果启动了生产者，这时可在生产者终端输入数据，回车；即可以看到消费者终端有刚输入的数据展示

——至此，zookeeper+kafka端搭建完成

五、flume部署

新创建一台虚拟机，名为flume1，部署flume使用。此虚拟机是用作收集日志源文件（即监控的源文件夹所在服务器）

5.1、解压

 tar -zxvf apache-flume-1.7.0-bin.tar.gz

5.2、配置conf

cd到flume的conf文件夹，拷贝conf模板文件

cp flume-conf.properties.template spooldir-kafka-hdfs.conf

5.3、编辑spooldir-kafka-hdfs.conf

#设置source的名称  
agent.sources = s  
#设置channels的名称  
agent.channels = c c1  
#设置sink的名称  
agent.sinks = r r1  
# For each one of the sources, the type is defined  
#exec 文件 spoolDir 文件夹  
#agent.sources.s.type = exec  
#agent.sources.s.command = tail -n +0 -F /opt/testLog/test.log  
agent.sources.s.type = spooldir  
agent.sources.s.spoolDir = /usr/local/xialei/hdfs_test
agent.sources.s.fileHeader = true  
agent.sources.s.bathSize =100  
# The channel can be defined as follows.  
agent.sources.s.channels = c c1  
# Each sink's type must be defined  
#agent.sinks.r.type = org.apache.flume.plugins.KafkaSink  
agent.sinks.r.type = org.apache.flume.sink.kafka.KafkaSink  
#Specify the channel the sink should use  
#agent.sinks.r.metadata.broker.list = localhost:9092  
agent.sinks.r.kafka.bootstrap.servers = 192.168.48.144:9092  
agent.sinks.r.partition.key=0  
agent.sinks.r.partitioner.class=org.apache.flume.plugins.SinglePartition  
agent.sinks.r.serializer.class=kafka.serializer.StringEncoder  
agent.sinks.r.request.required.acks=0  
agent.sinks.r.max.message.size=1000000  
agent.sinks.r.producer.type=sync  
agent.sinks.r.custom.encoding=UTF-8  
agent.sinks.r.kafka.topic = xltopic  
# Each channel's type is defined.  
# Other config values specific to each type of channel(sink or source)  
# can be defined as well  
# In this case, it specifies the capacity of the memory channel  
#channel存储容量  
#agent.channels.memoryChannel.capacity = 1000  
#事务容量  
#agent.channels.memoryChannel.transactionCapacity = 100  
#hdfsSink配置  
agent.sinks.r1.type = hdfs  
agent.sinks.r1.channel = c1  
agent.sinks.r1.hdfs.path = hdfs://192.168.48.144:9000/flume1/%y-%m-%d/%H%M/
agent.sinks.r1.hdfs.filePrefix=events-  
#设置文件后缀名  
#agent.sinks.r1.hdfs.fileSuffix = .log  
agent.sinks.r1.hdfs.round = true  
agent.sinks.r1.hdfs.roundValue = 10  
agent.sinks.r1.hdfs.roundUnit = minute  
#文件格式 默认 seq文件，  
agent.sinks.r1.hdfs.fileType = DataStream  
agent.sinks.r1.hdfs.writeFormat=Text  
agent.sinks.r1.hdfs.rollInterval=0  
#--触发roll操作的文件大小in bytes (0: never roll based on file size)  
agent.sinks.r1.hdfs.rollSize=128000000  
#--在roll操作之前写入文件的事件数量(0 = never roll based on number of events)  
agent.sinks.r1.hdfs.rollCount=0  
agent.sinks.r1.hdfs.idleTimeout=60  
#--使用local time来替换转移字符 (而不是使用event header的timestamp)  
agent.sinks.r1.hdfs.useLocalTimeStamp = true  
agent.channels.c1.type = memory  
agent.channels.c1.capacity = 1000  
agent.channels.c1.transactionCapacity=1000  
agent.channels.c1.keep-alive=30  
agent.sinks.r.channel = c  
agent.channels.c.type = memory  
agent.channels.c.capacity = 1000

备注:该配置文件监听了 /usr/local/xialei/hdfs_test 文件夹，并设置了两个sink（r和r1），一个发送给kafka消费（注意topic名称，需要跟hadoop1服务器上开启消费的kafka的topic一致，才可以开启数据传送），一个发送给hdfs存储。注意修改ip

5.4、启动flume

cd到flume的bin文件夹下

sh flume-ng agent  -c ../conf -f ../conf/spooldir-kafka-hdfs.conf -n agent  -Dflume.root.logger=INFO,console &

查看打印的日志，是否r和r1启动成功。

5.5 测试

启动成功就可以开始测试

测试一：目前配置文件监听的是文件夹，于是我们向监听的文件夹中新建文件。

vim testfile

:wq 保存。

查看kafka监听程序打印日志,结果正确

查看hdfs目录

[root@hadoop1 ~]# hdfs dfs -ls /flume1/
Found 1 items
drwxr-xr-x   - root supergroup          0 2019-03-26 20:55 /flume1/19-03-26

发现多了一个以日期命名的文件夹（这也是我们在flume配置文件中配置的），打开查看内容

[root@hadoop1 ~]# hdfs dfs -cat /flume1/19-03-26/2100/events-.1553659399314.tmp
hello
flume
kafka

flume成功监听文件夹，将变动传递给kafk实施消费，并且下沉hdfs

测试二：

我们先关闭flume（kill进程）。

修改flume属性文件

将以前的sources.s.type修改为exec，再用command来监听命令。看到这大家应该明白我想测试什么了，没错，就是测试tomcat日志能否实时在kafka展示和存入hdfs。

启动flume，再启动我们的tomcat（空的tomcat，只是用来测试日志，注意路径）

查看kafka消费者：

正常打印日志，说明已经监听到了这个tail命令。

查看hdfs中文件目录同样也出现了新的文件，用hdfs命令查看文件内容

日志信息也正常存入hdfs中。

学习永无止境，技术日新月异，一起学习，一起奋斗！共勉

flume-zookeeper-kafka-hdfs搭建完整版