Log4j直接发送数据到Flume + Kafka (方式一)

通过flume收集系统日记, 收集的方式通常采用以下两种:

flume以监听文件的方式进行收集, 对系统程序代码无入侵.
业务程序的logs直接发送给flume服务系统, 本文针对这种方式进行说明.

主要准备工作:

Linux系统一台(可以使用虚拟机), 安装flume, zookeeper, kafka

java环境开发机

flume的配置

flume的安装使用介绍, 网上已经有详细的说明.

flume安装后, 主要的配置如下:

文件: conf/flume-conf.properties 为以下内容

agent.sources = s1
agent.channels = c1
agent.sinks = k1

# For each one of the sources, the type is defined
agent.sources.s1.type = avro
agent.sources.s1.bind = 0.0.0.0
agent.sources.s1.port = 8888

# The channel can be defined as follows.
agent.sources.s1.channels = c1

# Each sink's type must be defined
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.kafka.topic = test
agent.sinks.k1.kafka.bootstrap.servers = 127.0.0.1:9092

#Specify the channel the sink should use
agent.sinks.k1.channel = c1

# Each channel's type is defined.
agent.channels.c1.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.c1.capacity = 100

启动flume服务命令:

##如果flume没有配置PATH

cd flume的解压缩包目录

bin/flume-ng agent --conf conf -f conf/flume-conf.properties -n agent -Dflume.root.logger=INFO,console

zookeeper配置

因为kafka依赖于zookeeper,所以zookeeper也需要安装.

配置文件: zoo.cfg 为以下内容

# zookeeper 定义的基准时间间隔，单位：毫秒
tickTime=2000

# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.

# 数据文件夹
dataDir=/tmp/zookeeper/data

# 日志文件夹
dataLogDir=/tmp/zookeeper/logs

# the port at which the clients will connect
# 客户端访问 zookeeper 的端口号
clientPort=2181

# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

以上配置内容中, data ,log 路径可以根据自己的情况进行配置.

启动zookeeper 系统服务: zkServer.sh start

kafka配置

kafka的config/server.properties配置注意:

broker.id=10 #自定义, 集群中不要冲突.

advertised.listeners=PLAINTEXT://127.0.0.1:9092

config/zookeeper.properties文件中的配置要与上面的zookeeper配置一致, 如:

# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper/data
# the port at which the clients will connect
clientPort=2181

启动kafka服务: bin/kafka-server-start.sh config/server.properties

接下来要创建topic, topic名称要与flume中配置的sink对应kafka.topic的名称一致:

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

查看消费的命令:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

以上就是组件的一些基本配置点. 真正上生产,根据业务flume配置多个节点, zookeeper 和 kafka集群方式.

各个服务的配置参数也要进行优化.

java log4j 配置

spring boot demo业务程序源码下载链接: https://download.csdn.net/download/spring410/10971990

Springboot 项目中pom.xml的依赖包配置要注意:

<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter</artifactId>
            <!-- 不引入默认的日志 -->
			<exclusions>
				<exclusion>
					<groupId>org.springframework.boot</groupId>
					<artifactId>spring-boot-starter-logging</artifactId>
				</exclusion>
			</exclusions>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
			<!-- 不引入默认的日志 -->
			<exclusions>
				<exclusion>
					<groupId>org.springframework.boot</groupId>
					<artifactId>spring-boot-starter-logging</artifactId>
				</exclusion>
			</exclusions>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-log4j2</artifactId>
		</dependency>

	   <!-- 写入flume 依赖的jar -->
	   <dependency>
		   <groupId>org.apache.logging.log4j</groupId>
		   <artifactId>log4j-flume-ng</artifactId>
	   </dependency>

log4j2.xml的配置要点如下:

ip和port自己调整.

<Properties>
<!-- 配置日志文件输出目录 -->
<Property name="file_path">logs</Property>
<Property name="app">findlogs</Property>
<Property name="log_pattern">%d [${app}] [%class{1.}.%method:%line] [%level] - %m%n</Property>
</Properties>

<Flume name="FlumeAppender" compress="false" type="avro" ignoreExceptions="false">
    <Agent host="10.211.55.5" port="8888"/>
    <PatternLayout charset="UTF-8" pattern="${log_pattern}" />
</Flume>

<Loggers>
    <root level="INFO">
    <appender-ref ref="Console"/>
    <appender-ref ref="FlumeAppender"/>
    </root>
</Loggers>

启动项目后, 可以从kakfa消费队列看到输出打印的logs信息:

以下是程序打印出来的logs.

但这种方式有个弊端, 业务程序依赖于flume服务, 如果flume服务宕机了, 业务程序会影响, 下图所示:

所以我们生产上真正使用的是以下动态接入flume的方式, 请看下文:

Log4j直接发送数据到Flume + Kafka (方式二)

Log4j直接发送数据到Flume + Kafka (方式一)

flume的配置

zookeeper配置

kafka配置

java log4j 配置

猜你喜欢