1 Background and questions

With the popularization of cloud computing and PaaS platforms, and the application of technologies such as virtualization and containerization, such as Docker and other technologies, more and more services will be deployed in the cloud. Usually, we need to obtain logs for monitoring, analysis, prediction, statistics, etc., but cloud services are not physical fixed resources, and the difficulty of obtaining logs has increased. In the past, SSH login or FTP can be obtained, but now it is not so It is easy to obtain, but this is what engineers urgently need. The most typical scenario is: during the online process, everything is done with a click of the mouse on the GUI-based PaaS platform, but we need to combine commands such as tail -F and grep to observe the logs. Determine whether the online is successful. Of course, this is a situation. A complete PaaS platform will complete this work for us, but there are still many ad-hoc needs. The PaaS platform cannot meet us, and we need logs. This article presents a method for centralized collection of scattered logs in containerized services in a distributed environment.

2 Design constraints and requirements description

Before doing any design, it is necessary to clarify the application scenarios, functional requirements and non-functional requirements.

2.1 Application Scenario

In a distributed environment, it can carry logs generated by hundreds of servers. A single data log is less than 1k, and the maximum size does not exceed 50k. The total log size is less than 500G per day.

2.2 Functional requirements

1) Collect all service logs centrally.

2) Distinguishable sources, segmented by service, module and day granularity.

2.3 Non-functional requirements

1) It does not invade the service process. The log collection function needs to be deployed independently, and the occupation of system resources is controllable.

2) Real-time, low latency, the delay from generating logs to centralized storage is less than 4s.

3) Persistence, keep the last N days.

4) It is enough to deliver the logs as much as possible. It is not required that they are not lost or heavy, but the ratio should not exceed a threshold (for example, 1/10,000).

4) Lack of strict order can be tolerated.

5) The collection service is an offline and offline function, and the usability requirements are not high. It is enough to meet 3 9s throughout the year.

3 Implementation Architecture

The architecture implemented by a solution is shown in the following figure:

3.1 Producer layer analysis

The services in the PaaS platform are assumed to be deployed in Docker containers, so in order to meet the non-functional requirement #1, another independent process is responsible for collecting logs, so it does not invade the service framework and process. Flume NG is used to collect logs. This open source component is very powerful. It can be regarded as a model for monitoring, incremental production, and publishing and consumption. Source is the source and the incremental source. Channel is the buffer channel. Here Using memory queue buffers, Sink is a slot, a place for consumption. The source in the container is to execute the tail -F command to use the standard output of linux to read the incremental log. The sink is an implementation of Kafka, which is used to push messages to distributed message middleware.

3.2 Broker layer analysis

For multiple containers in the PaaS platform, there will be multiple Flume NG clients to push messages to the Kafka message middleware. Kafka is a message middleware with very high throughput and performance. It uses a single partition to write sequentially and supports random reading according to the offset offset. Therefore, it is very suitable for the implementation of the topic publish and subscribe model. There are multiple Kafka in the figure, because it supports the cluster feature, the Flume NG client in the container can connect to several Kafka brokers to publish logs, or it can be understood as connecting to the partitions under several topics, which can achieve high throughput, one It can be packaged and sent in batches within Flume NG to reduce QPS pressure. Second, it can be distributed to multiple partitions for writing. At the same time, Kafka will also specify the number of replica backups to ensure that N backups need to be written after writing to a master. , here is set to 2, and 3 of the commonly used distributed system is not used, because try to ensure high concurrency characteristics and meet #4 in non-functional requirements.

3.3 Consumer layer analysis

It is also a Flume NG that consumes Kafka increments. It can be seen that its power is that it can access any data source, and it is a pluggable implementation with a small amount of configuration. Here, Kafka Source is used to subscribe to topics. The collected logs are also first entered into the memory buffer, and then written to the file using a File Sink. In order to meet the functional requirement #2, the source can be distinguished and divided by service, module and day granularity. I implement it myself A sink called RollingByTypeAndDayFileSink, the source code is placed on github , you can download the jar from this page , and put it directly in the lib directory of flume.

4 Practical methods

4.1 Placement in the container

Dockerfile

Dockerfile is the running script of the program in the container, which contains many commands that come with docker. The following is a typical Dockerfile. BASE_IMAGE is an image that contains the running program and flume bin. The more important one is ENTRYPOINT, which mainly uses supervisord to Ensure the high availability of the processes in the container.

FROM ${BASE_IMAGE}
MAINTAINER ${MAINTAINER}
ENV REFRESH_AT ${REFRESH_AT}
RUN mkdir -p /opt/${MODULE_NAME}
ADD ${PACKAGE_NAME} /opt/${MODULE_NAME}/
COPY service.supervisord.conf /etc/supervisord.conf.d/service.supervisord.conf
COPY supervisor-msoa-wrapper.sh /opt/${MODULE_NAME}/supervisor-msoa-wrapper.sh
RUN chmod +x /opt/${MODULE_NAME}/supervisor-msoa-wrapper.sh
RUN chmod +x /opt/${MODULE_NAME}/*.sh
EXPOSE
ENTRYPOINT ["/usr/bin/supervisord", "-c", "/etc/supervisord.conf"]

The following is the supervisord configuration file, execute the supervisor-msoa-wrapper.sh script.

[program:${MODULE_NAME}]
command=/opt/${MODULE_NAME}/supervisor-msoa-wrapper.sh

The following is supervisor-msoa-wrapper.sh. The start.sh or stop.sh in this script is the start and stop script of the application. The background here is that our start and stop scripts are all running in the background, so they will not Block the current process, so if you exit directly, Docker will think that the program is over, so the application life cycle is over. Here, the wait command is used to block, so that we can ensure that even if the process is running in the background, we can seem to be running in the foreground. .

The running command of flume is added here. The parameter mark after –conf will go to this folder to find flume-env.sh, which can define JAVA_HOME and JAVA_OPTS. --conf-file specifies the configuration of flume's actual source, channel, sink, etc.

#! /bin/bash
function shutdown()
{
    date
    echo "Shutting down Service"
    unset SERVICE_PID # Necessary in some cases
    cd /opt/${MODULE_NAME}
    source stop.sh
}
 
## stop process
cd /opt/${MODULE_NAME}
echo "Stopping Service"
source stop.sh
 
## start the process
echo "Starting Service"
source start.sh
export SERVICE_PID=$!
 
## Start Flume NG agent and wait for 4s log to be generated by start.sh
sleep 4
nohup /opt/apache-flume-1.6.0-bin/bin/flume-ng agent --conf /opt/apache-flume-1.6.0-bin/conf --conf-file /opt/apache-flume-1.6.0-bin/conf/logback-to-kafka.conf --name a1 -Dflume.root.logger=INFO,console &
 
# Allow any signal which would kill a process to stop Service
trap shutdown HUP INT QUIT ABRT KILL ALRM TERM TSTP
 
echo "Waiting for $SERVICE_PID"
wait $SERVICE_PID

Flume configuration

The source should use exec source , just execute tailf -F log file. But a self-developed StaticLinePrefixExecSource is used here , and the source code can be found on github . The reason for using custom is that some fixed information needs to be passed on, such as the name of the service/module and the hostname of the container where the distributed service is located, so that the collector can distinguish the logs based on this tag. If you find here why you don't use flume's interceptor to do this work, wouldn't it be OK to add some KVs in the header? This is a small pit, I will explain it later.

For example, a line of the original log:

[INFO]  2016-03-18 12:59:31,080 [main]  fountain.runner.CustomConsumerFactoryPostProcessor      (CustomConsumerFactoryPostProcessor.java:91)    -Start to init IoC container by loading XML bean definitions from classpath:fountain-consumer-stdout.xml

According to the following configuration, the log actually passed to the Channel is:

service1##$$##m1-ocean-1004.cp  [INFO]  2016-03-18 12:59:31,080 [main]  fountain.runner.CustomConsumerFactoryPostProcessor      (CustomConsumerFactoryPostProcessor.java:91)    -Start to init IoC container by loading XML bean definitions from classpath:fountain-consumer-stdout.xml

channel使用内存缓冲队列，大小标识可容乃的日志条数（event size），事务可以控制一次性从source以及一次性给sink的批量日志条数，实际内部有个timeout超时，可通过keepAlive参数设置，超时后仍然会推送过去，默认为3s。

sink采用 Kafka sink，配置broker的list列表以及topic的名称，需要ACK与否，以及一次性批量发送的日志大小，默认5条一个包，如果并发很大可以把这个值扩大，加大吞吐。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
a1.sources.r1.type = com.baidu.unbiz.flume.sink.StaticLinePrefixExecSource
a1.sources.r1.command = tail -F /opt/MODULE_NAME/log/logback.log
a1.sources.r1.channels = c1
a1.sources.r1.prefix=service1
a1.sources.r1.separator=##$$##
a1.sources.r1.suffix=m1-ocean-1004.cp
 
# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = keplerlog
a1.sinks.k1.brokerList = gzns-cm-201508c02n01.gzns:9092,gzns-cm-201508c02n02.gzn
s:9092
a1.sinks.k1.requiredAcks = 0
a1.sinks.k1.batchSize = 5
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4.2 Broker配置

参考 Kafka官方的教程，这里新建一个名称叫做keplerlog的topic，备份数量为2，分区为4。

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 4 --topic keplerlog

制造一些增量信息，例如如下脚本，在终端内可以随便输入一些字符串:

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic keplerlog

打开另外一个终端，订阅topic，确认可以看到producer的输入的字符串即可，即表示联通了。

> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic keplerlog --from-beginning

4.3 集中接收日志配置

Flume配置

首先source采用flume官方提供的 KafkaSource，配置好zookeeper的地址，会去找可用的broker list进行日志的订阅接收。channel采用内存缓存队列。sink由于我们的需求是按照服务名称和日期切分日志，而官方提供的默认 file roll sink，只能按照时间戳，和时间interval来切分。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.zookeeperConnect = localhost:2181
a1.sources.r1.topic = keplerlog
a1.sources.r1.batchSize = 5
a1.sources.r1.groupId = flume-collector
a1.sources.r1.kafka.consumer.timeout.ms = 800
 
# Describe the sink
a1.sinks.k1.type = com.baidu.unbiz.flume.sink.RollingByTypeAndDayFileSink
a1.sinks.k1.channel = c1
a1.sinks.k1.sink.directory = /home/work/data/kepler-log
 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

定制版RollingByTypeAndDayFileSink

源代码见 github。RollingByTypeAndDayFileSink使用有两个条件：

1）Event header中必须有timestamp，否则会忽略事件，并且会抛出{@link InputNotSpecifiedException}

2）Event body如果是按照##$$##分隔的，那么把分隔之前的字符串当做模块名称（module name）来处理；如果没有则默认为default文件名。

输出到本地文件，首先要设置一个跟目录，通过sink.directory设置。其次根据条件#2中提取出来的module name作为文件名称前缀，timestamp日志作为文件名称后缀，例如文件名为portal.20150606或者default.20150703。

规整完的一个文件目录形式如下，可以看出汇集了众多服务的日志，并且按照服务名称、时间进行了区分：

~/data/kepler-log$ ls
authorization.20160512  
default.20160513  
default.20160505 
portal.20160512       
portal.20160505   
portal.20160514

不得不提的两个坑

坑1

回到前两节提到的自定义了一个StaticLinePrefixExecSource来进行添加一些前缀的工作。由于要区分来源的服务/模块名称，并且按照时间来切分，根据官方flume文档，完全可以采用如下的Source拦截器配置。例如i1表示时间戳，i2表示默认的静态变量KV，key=module，value=portal。

a1.sources.r1.interceptors = i2 i1
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i2.type = static
a1.sources.r1.interceptors.i2.key = module
a1.sources.r1.interceptors.i2.value = portal

但是flume官方默认的 KafkaSource（v1.6.0）的实现:

95 while (eventList.size() < batchUpperLimit &&
96               System.currentTimeMillis() < batchEndTime) {
97         iterStatus = hasNext();
98         if (iterStatus) {
99           // get next message
100          MessageAndMetadata<byte[], byte[]> messageAndMetadata = it.next();
101          kafkaMessage = messageAndMetadata.message();
102          kafkaKey = messageAndMetadata.key();
103
104          // Add headers to event (topic, timestamp, and key)
105          headers = new HashMap<String, String>();
106          headers.put(KafkaSourceConstants.TIMESTAMP,
107                  String.valueOf(System.currentTimeMillis()));
108          headers.put(KafkaSourceConstants.TOPIC, topic);
109          if (kafkaKey != null) {
110            headers.put(KafkaSourceConstants.KEY, new String(kafkaKey));
111          }
112          if (log.isDebugEnabled()) {
113            log.debug("Message: {}", new String(kafkaMessage));
114          }
115          event = EventBuilder.withBody(kafkaMessage, headers);
116          eventList.add(event);
117        }

可以看出自己重写了Event header中的KV，丢弃了发送过来的header，因为这个坑的存在因此，tailf -F在event body中在前面指定模块/服务名称，然后RollingByTypeAndDayFileSink会按照分隔符切分。否则下游无法能达到KV。

坑2

exec source需要执行tail -F命令来通过标准输出和标准错误一行一行的读取，但是如果把tail -F封装在一个脚本中，脚本中再执行一些管道命令，例如tail -F logback.log | awk ‘{print "portal##$$##"$0}’，那么exec source总是会把最近的输出丢弃掉，导致追加到文件末尾的日志有一些无法总是“姗姗来迟”，除非有新的日志追加，他们才会被“挤”出来。这个问题比较诡异。暂时没有细致研究。以示后人不要采坑。

5 结语

从这个分布式服务分散日志的集中收集方法，可以看出利用一些开源组件，可以非常方便的解决我们日常工作中所发现的问题，而这个发现问题和解决问题的能力才是工程师的基本素质要求。对于其不满足需求的，需要具备有钻研精神，知其然还要知其所以然的去做一些ad-hoc工作，才可以更加好的leverage这些组件。

另外，日志的收集只是起点，利用宝贵的数据，后面的使用场景和想象空间都会非常大，例如

1）利用Spark streaming在一个时间窗口内计算日志，做流量控制和访问限制。

2）使用awk脚本、scala语言的高级函数做单机的访问统计分析，或者Hadoop、Spark做大数据的统计分析。

3）除了端口存活和语义监控，利用实时计算处理日志，做ERROR、异常等信息的过滤，实现服务真正的健康保障和预警监控。

4）收集的日志可以通过logstash导入Elastic Search，使用ELK方式做日志查询使用。

转自 neoremind.com

Application practice of collecting distributed logs in Docker container with Flume+Kafka