Best Practices for Log Collection

Overview

This article introduces how to use the log function of Tencent Cloud Container Service TKE to collect, store and query logs, analyze various function usage and scenarios, and give some best practice suggestions.

Note : This article only applies to TKE clusters.

How to get started quickly?

TKE logging entrance 集群运维-日志规则, more information on how to enable the log collection and usage basis for the TKE cluster, refer to the official documentation log collection .

What is the technical architecture?

After the TKE cluster starts log collection, tke-log-agent is deployed on each node as a DaemonSet, and is responsible for collecting logs of the containers on the node according to the collection rules, and then reporting to the CLS log service, where CLS will perform unified storage, retrieval and analysis:

img

Where are the logs collected?

When using the log collection in TKE, need to 集群运维-日志规则create a new rule in the log collection, you first need to determine what data source is a collection of goals, here are three types of supported data sources and their respective usage scenarios and suggestions.

Collect standard output

The simplest and most recommended way is to output the log of the Pod inner container to standard output. The log content will be managed by the container runtime (docker, containerd), which has the following benefits:

  1. No additional volume is required.
  2. You can directly kubectl logsview the contents of the log.
  3. Business does not need to care about log rotation. Logs will be stored and automatically rotated when the container is running, so as to avoid filling up the disk due to the large amount of individual Pod logs.
  4. You don't need to care about the log file path, you can use more unified collection rules, use less collection rules to cover more workloads, and reduce the complexity of operation and maintenance.

Sample acquisition configuration:

img

Collect the files in the container

Many times the business records logs by writing log files. When the container is used to run the business, the log files are written to the container:

  1. If the path where the log file is located is not mounted with a volume, the log file will be written to the container's writable layer and placed on the container data disk. The usual path is /var/lib/docker(it is recommended to mount to this path to avoid mixing with the system disk). After the container stops The log will be cleared.
  2. If the log file path where the mount volume, the log file will be a corresponding drop volume to the disk type storage backend; usually emptydir, after stopping the container logs are cleared during operation of the log file to the host disk of the drop /var/lib/kubeletof the path, This path usually does not have a separate disk, that is, the system disk will be used; due to the use of log collection and unified storage capabilities, it is not recommended to mount other persistent storage to store log files (such as cloud hard disk CBS, object storage COS, Shared storage (CFS).

Many open source log collectors need to mount a volume to the Pod log file path to collect it, but not for log collection using TKE, so if you output the log to a file in the container, you don't need to care about whether the volume is mounted.

Sample acquisition configuration:

img

Collect files on the host

If the business writes the log into the log file, but still wants to keep the original log file after the container is stopped, it is good to have a backup to avoid the complete loss of the log when the collection is abnormal, then you can mount the hostPath to the log file path, and the log file will be Drop the disk to the specified directory of the host, and the log file will not be cleaned up after the container stops.

Since the log files will not be automatically cleaned up, some students may worry that the logs will be collected repeatedly. For example, when the Pod is scheduled to go and then scheduled back, the log files are written in the same path as before. Whether the collection will be repeated, there are two situations:

  1. The file name is the same, such as a fixed file path /data/log/nginx/access.log. At this time, the collection will not be repeated, because the collector will remember the location of the log file collected before and only collect the incremental part.
  2. The file name is different, usually the log framework for business will automatically rotate the log according to a certain period of time, usually according to the day, automatically rename the old log file, and add a timestamp suffix. If "*" is used as a wildcard in the collection rule to match the log file name, the collection may be repeated, because after the log framework renames the log file, the collector will think it matches the newly written log file, and then Perform the collection once.

Therefore, it is generally not repeated collection. If the log framework automatically rotates the log, it is recommended that the collection rule does not use the wildcard "*" to match the log file.

Sample acquisition configuration:

img

Where did the log vomit?

After knowing where to collect the data, we also need to know where to store the collected logs. According to the technical architecture mentioned above, TKE log collection is integrated with the CLS log service on the cloud, and log data will also be uniformly reported to the log service. The log service manages logs through log sets and log topics. Log sets are the project management unit of CLS and can contain multiple log topics; generally, the logs of the same business are placed in the same log set, the same category in the same business The application or service uses the same log theme. In TKE, the log collection rules and log topics are in a one-to-one correspondence; when TKE creates log collection rules, select the consumer side, you need to specify the log set and log subject, and the log set is usually created in advance , The log theme is usually created automatically:

img

After creation, you can rename the automatically created log subject according to the situation, so that you can find the log subject of the log during subsequent retrieval:

img

How to configure log format parsing?

With the original data of the log, we also need to tell the log service how to parse the log to facilitate subsequent retrieval. When creating log collection rules, you need to configure the log parsing format. The following analysis and suggestions are given for each configuration.

Which crawl mode to use?

First, we need to determine the log capture mode, which supports 5 types: single-line text, JSON, separator, multi-line text, and full regularization.

img

It is recommended to use JSON, because the JSON format itself structure the log. Log service can extract the key of JSON as the field name and value as the corresponding field value. It is no longer necessary to configure complex matching rules according to the business log output format. Log example :

{"remote_ip":"10.135.46.111","time_local":"22/Jan/2019:19:19:34 +0800","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}

The premise of using JSON capture mode is that the business log itself is output in JSON format. If it is not in JSON format, but switching to JSON format is not expensive, it is recommended to switch. If it is not easy to switch, consider other captures. Take the pattern.

If the log content is a single-line text output in a fixed format, consider using the "delimiter" or "completely regular" capture mode. "Separator" applies a simple format. Each field value in the log is separated by a fixed string, such as ":::". The content of a log is:

10.20.20.10 ::: [Tue Jan 22 14:49:45 CST 2019 +0800] ::: GET /online/sample HTTP/1.1 ::: 127.0.0.1 ::: 200 ::: 647 ::: 35 ::: http://127.0.0.1/

You can configure ":::" to customize the separator, and configure the field name for each field in order, example:

img

"Complete Regular" applies to complex formats, and uses regular expressions to match the log format. For example, the log content is:

10.135.46.111 - - [22/Jan/2019:19:19:30 +0800] "GET /my/course/1 HTTP/1.1" 127.0.0.1 200 782 9703 "http://127.0.0.1/course/explore?filter%5Btype%5D=all&filter%5Bprice%5D=all&filter%5BcurrentLevelId%5D=all&orderBy=studentNum" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0"  0.354 0.354

The regular expression can be set as:

(\S+)[^\[]+(\[[^:]+:\d+:\d+:\d+\s\S+)\s"(\w+)\s(\S+)\s([^"]+)"\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s"([^"]+)"\s"([^"]+)"\s+(\S+)\s(\S+).*

Log service will use the ()capture group to distinguish each field, we also need to set the field name for each field, configuration examples:

img

If the log does not have a fixed output format, consider using the "single-line text" or "multi-line text" capture mode. Using these two modes, the log content itself will not be structured, and the log fields will not be extracted. The timestamp of each log is also fixed by the time of log collection, and only simple fuzzy queries can be performed during retrieval. The difference between these two modes is whether the log content is a single line or multiple lines. If it is a single line, it is the easiest to set any matching conditions. Each line is a separate log; if it is multiple lines, you need to set the first line regular expression. That is, the regular expression that matches the first line of each log. When a line matches the preset first line regular expression, it is considered the beginning of a log, and the beginning of the next line appears as the end identifier of the log. If the content of the multi-line log is:

10.20.20.10 - - [Tue Jan 22 14:24:03 CST 2019 +0800] GET /online/sample HTTP/1.1 127.0.0.1 200 628 35 http://127.0.0.1/group/1 
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0 0.310 0.310

Then the first line of regular expression can be set as: \d+\.\d+\.\d+\.\d+\s-\s.*

img

How to filter out unwanted content?

Some unimportant or unconcerned logs can be filtered out to reduce costs.

If you use the "JSON", "separator" or "complete regular" crawl mode, the log content will be structured, and you can match the logs to be retained by specifying the fields:

img

For the "single-line text" and "multi-line text" capture modes, since the log content is not structured, it is not possible to specify a field to filter, and usually directly use regularization to fuzzy match the complete log content to be retained:

img

It should be noted that the content must match in mind is to use a regular rather than a complete match, for example, want to retain only a.test.comthe log domain, expression match should write a\.test\.cominstead a.test.com.

How to customize the log timestamp?

Each log needs a timestamp. This timestamp is mainly used for retrieval, and the time range can be selected during retrieval. By default, the timestamp of the log is determined by the time of collection. You can also customize it. Select a field as the timestamp. This may be more accurate in some cases. For example, the service has been running before the collection rule is created. For a period of time, if you do not set a custom time format, the time stamp of the previous old log will be set to the current time during collection, resulting in inaccurate time.

How to customize it? Since the "single-line text" and "multi-line text" capture modes do not structure the log content, there is no field that can be designated as a timestamp, and it is impossible to customize the time format analysis. Other capture modes can be supported. For the specific method, turn off "Use acquisition time", then select the field name to be used as the time stamp and configure the time format.

If using log timefield as a time stamp, wherein a log timevalue 2020-09-22 18:18:18, can be set as the time format %Y-%m-%d %H:%M:%S, examples:

img

More time format configuration log refer to the official documentation service provisioning time format .

It should be noted that the log service timestamp is only accurate to the second for the time being, that is, if the timestamp field of the business log is accurate to milliseconds, the custom timestamp cannot be used, and the default collection time can only be used as the timestamp, but the time Stamps accurate to milliseconds will be supported later.

How to query logs?

Well equipped log collection rules, the collector will automatically start collecting logs and reported to the logging service, then you can 日志服务-检索分析log queries, and support for Lucene syntax, but the premise is the need to open the index, the index has the following three categories:

  1. Full-text index. Used for fuzzy search without specifying the field.
    img
  2. Key-value index. Index structured processed log content, you can specify log fields for retrieval.
    img
  3. Meta field index. Some additional fields are automatically appended when the log is reported, such as pod name, namespace, etc., so that you can specify these fields for retrieval when searching.

img

Example query:

img

How to post the log to other places?

The log service supports the delivery of logs to COS object storage and Ckafka (Kafka hosted by Tencent Cloud), and the delivery can be set in the log topic:

img

It can be used in the following scenarios:

  1. Long-term archive storage of log data. The log set stores 7 days of log data by default, and the duration can be adjusted, but the larger the amount of data, the higher the cost. Usually only a few days of data are retained. If the log needs to be stored for a longer period of time, it can be delivered to COS for low-cost storage .
  2. The log needs to be further processed (such as offline calculation), which can be delivered to COS or Ckafka, and processed by other programs.

Reference

Guess you like

Origin blog.51cto.com/14120339/2542948