通过flume收集系统日记, 收集的方式通常采用以下两种:
- flume以监听文件的方式进行收集, 对系统程序代码无入侵.
- 业务程序的logs直接发送给flume服务系统, 本文针对这种方式进行说明.
主要准备工作:
- Linux系统一台(可以使用虚拟机), 安装flume, zookeeper, kafka
- java环境开发机
-
flume的配置
flume的安装使用介绍, 网上已经有详细的说明.
flume安装后, 主要的配置如下:
文件: conf/flume-conf.properties 为以下内容
agent.sources = s1
agent.channels = c1
agent.sinks = k1
# For each one of the sources, the type is defined
agent.sources.s1.type = avro
agent.sources.s1.bind = 0.0.0.0
agent.sources.s1.port = 8888
# The channel can be defined as follows.
agent.sources.s1.channels = c1
# Each sink's type must be defined
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.kafka.topic = test
agent.sinks.k1.kafka.bootstrap.servers = 127.0.0.1:9092
#Specify the channel the sink should use
agent.sinks.k1.channel = c1
# Each channel's type is defined.
agent.channels.c1.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.c1.capacity = 100
启动flume服务命令:
##如果flume没有配置PATH
cd flume的解压缩包目录
bin/flume-ng agent --conf conf -f conf/flume-conf.properties -n agent -Dflume.root.logger=INFO,console
-
zookeeper配置
因为kafka依赖于zookeeper,所以zookeeper也需要安装.
配置文件: zoo.cfg 为以下内容
# zookeeper 定义的基准时间间隔,单位:毫秒
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
# 数据文件夹
dataDir=/tmp/zookeeper/data
# 日志文件夹
dataLogDir=/tmp/zookeeper/logs
# the port at which the clients will connect
# 客户端访问 zookeeper 的端口号
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
以上配置内容中, data ,log 路径可以根据自己的情况进行配置.
启动zookeeper 系统服务: zkServer.sh start
-
kafka配置
kafka的config/server.properties配置注意:
broker.id=10 #自定义, 集群中不要冲突.
advertised.listeners=PLAINTEXT://127.0.0.1:9092
config/zookeeper.properties文件中的配置要与上面的zookeeper配置一致, 如:
# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper/data
# the port at which the clients will connect
clientPort=2181
启动kafka服务: bin/kafka-server-start.sh config/server.properties
接下来要创建topic, topic名称要与flume中配置的sink对应kafka.topic的名称一致:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
查看消费的命令:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
以上就是组件的一些基本配置点. 真正上生产,根据业务flume配置多个节点, zookeeper 和 kafka集群方式.
各个服务的配置参数也要进行优化.
-
java log4j 配置
spring boot demo业务程序源码下载链接: https://download.csdn.net/download/spring410/10971990
Springboot 项目中pom.xml的依赖包配置要注意:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
<!-- 不引入默认的日志 -->
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<!-- 不引入默认的日志 -->
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j2</artifactId>
</dependency>
<!-- 写入flume 依赖的jar -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-flume-ng</artifactId>
</dependency>
log4j2.xml的配置要点如下:
ip和port自己调整.
<Properties>
<!-- 配置日志文件输出目录 -->
<Property name="file_path">logs</Property>
<Property name="app">findlogs</Property>
<Property name="log_pattern">%d [${app}] [%class{1.}.%method:%line] [%level] - %m%n</Property>
</Properties>
<Flume name="FlumeAppender" compress="false" type="avro" ignoreExceptions="false">
<Agent host="10.211.55.5" port="8888"/>
<PatternLayout charset="UTF-8" pattern="${log_pattern}" />
</Flume>
<Loggers>
<root level="INFO">
<appender-ref ref="Console"/>
<appender-ref ref="FlumeAppender"/>
</root>
</Loggers>
启动项目后, 可以从kakfa消费队列看到输出打印的logs信息:
以下是程序打印出来的logs.
但这种方式有个弊端, 业务程序依赖于flume服务, 如果flume服务宕机了, 业务程序会影响, 下图所示:
所以我们生产上真正使用的是以下动态接入flume的方式, 请看下文: