ELK log collection system solution

background

In the early days of the project, everyone was rushing to go online. Generally speaking, there was no too much consideration for logs. Of course, the amount of logs was not large, so log4j was enough. The logs folder of each server is indeed a bit inconvenient. Or a distributed system: When we need log analysis, you probably do this: grep and awk directly in the log file can get the information you want. This has resulted in extremely cumbersome log query; if there is sensitive data in the log, but also consider whether open to all
possible problems:
how to log archiving is too big, too slow how to do text searches, how multi-dimensional query
application too Many, what should you do when faced with dozens or hundreds of applications? Logging in to the server at will to query the log will definitely affect the stability and security of the system. If the user is not familiar with Linux, it will be fatal to face huge logs, so why do you want to? How about using ELK? What problems can ELK solve for us?

Introduction to ELK components

  • Filebeat: It is a log file shipping tool. After installing the client on your server, filebeat will monitor the log directory or specified log files, and track and read these files (track file changes, keep reading) [Ruby language]
  • Kafka: is a high-throughput distributed publish-subscribe messaging system that can process all action flow data in consumer-scale websites
  • Logstash: [This component is required if you do deep processing of logs] is a pipeline with real-time data transmission capabilities, responsible for transmitting data information from the input end of the pipeline to the output end of the pipeline; at the same time, this pipeline also allows you Add a filter in the middle according to your needs. Logstash provides many powerful filters to meet your various application scenarios.
  • It provides a full-text search engine with distributed multi-user capabilities, based on the RESTFUL web interface
  • It is the user interface of ElasticSearch
    in actual application scenarios. In order to meet the real-time retrieval of big data, Filebeat is used to monitor log files. Kafka is used as the output terminal of Filebeat. After Kafka receives Filebeat in real time, it uses Logstash as the output terminal and outputs to Logstash The data may not be the formatted or specific business data we want. At this time, we can filter the data through some filtering plug-ins of Logstash and finally achieve the desired data format. Use ElasticSearch as the output terminal to output, and the data can be sent to ElasticSearch. Rich distributed retrieval. Kibana can display the data in ElasticSearch well for users to use
name Abbreviation Organization Introduction Recommended Use
plan 1 ELK logstash + es + kibana Logstash deployment generally eats memory Classic mode
Scenario 2 EFK filebeat + es + kibana Filebeat is much lighter than Logstash. This architecture is very suitable for small and medium log collection systems. Because logstash provides some rich filtering functions, in fact, many systems do not use it, so you can directly use filebeat to collect logs. Lightweight and non-intrusive
Scheme 3 FELK filebeat + logstash + es + kibana This design is suitable for systems that need to collect log information
A Logstash can be deployed to receive the logs collected by filebeat for centralized filtering and then send to es. Of course, if you have enough machines, Logstash can also deploy clusters to reduce pressure
Scheme 4 Personalization framework FELK introduces kafka or redis Scheme adopted by high concurrency big data traffic system

Generally speaking, the first type and the second type are used more. I believe that most companies can use the first type.

Scheme 1 ELK

Insert picture description here
Advantages: easy integration, less intrusion into the application, easy to build
Short board:

  • If there are many web applications, all logs are sent directly to logstash at the same time, logstash may not be able to handle the pressure
  • There are some other systems that also need to collect logs, such as big data systems, this architecture may not be very suitable
  • Logstash is a heavyweight middleware, which consumes memory.
    Integration method
    1: Add logstash-logback-encoder dependency in pom.xml of springboot application
<!--集成logstash-->
<dependency>
    <groupId>net.logstash.logback</groupId>
    <artifactId>logstash-logback-encoder</artifactId>
    <version>5.3</version>
</dependency>

2: The content of the logback-spring.xml file under the web application needs to be modified

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE configuration>
<configuration>
    <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
    <include resource="org/springframework/boot/logging/logback/console-appender.xml"/>
    <!--应用名称-->
    <property name="APP_NAME" value="APP_NAME"/>
    <!--日志文件保存路径-->
    <property name="LOG_FILE_PATH" value="${LOG_FILE:-${LOG_PATH:-${LOG_TEMP:-${java.io.tmpdir:-/tmp}}}/logs}"/>
    <contextName>${APP_NAME}</contextName>
    <!--每天记录日志到文件appender-->
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>${LOG_FILE_PATH}/${APP_NAME}-%d{
    
    yyyy-MM-dd}.log</fileNamePattern>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
        <encoder>
            <pattern>${FILE_LOG_PATTERN}</pattern>
        </encoder>
    </appender>
    <!--输出到logstash的appender-->
    <appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
        <!--可以访问的logstash日志收集端口-->
        <destination>127.0.0.1:4560</destination>
        <encoder charset="UTF-8" class="net.logstash.logback.encoder.LogstashEncoder"/>
    </appender>
    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
        <appender-ref ref="FILE"/>
        <appender-ref ref="LOGSTASH"/>
    </root>
</configuration>

Option two EFK

The fileBeat plugin synchronizes data to es EFK
Insert picture description here

Scheme 3: FELK

If you need to process the data, you need logstash
Insert picture description here

Option 4: Personalized Framework

(When the collected logs need to be processed or required by multiple parties) Kafka or redis is introduced. Generally, Filebeat+ (message middleware) + logstash (you don't need to, enter es directly through the message middleware) + es+kibana

The architecture of the back-end log collection system for large Internet companies: it consumes server resources, and may only be used by the core business of the head Internet company

Limit current through redsi/kafka middleware
Insert picture description here

to sum up:

Each solution has its own advantages and disadvantages. Using the right solution in the right scenario is our goal.

Guess you like

Origin blog.csdn.net/qq_38130094/article/details/115333335