Getting Started 1 --- ELFK log platform architecture design

Getting Started 1 --- ELFK log platform architecture design

ELFK log entry platform 2 --- Elasticseach Cluster Setup   

ELFK log entry platform to build 3 --- Kibana

ELFK log entry platform 4 --- Kafka Cluster Setup

ELFK log platform for entry-5 --- Logstash + Filebeat Cluster Setup

 1. What is the ELK  

        Logs, for any system, and are an important part. In a computer system inside, even more so. However, because most of the current computer system is more complex, many systems are not in one place, even all cross-border; even a system in place, there are a variety of sources, such as operating systems, applications services, business logic, etc. . They are constantly produces a wide variety of log data. According to incomplete statistics, the world we want to generate data about 2EB day. 1EB = 1024PB 1PB = 1024TB

        Faced with such vast amounts of data, and is distributed in various places, if we need to find some important information, are still using traditional methods, to log in to view on a single machine? It seems the traditional tools and methods have been a clumsy and inefficient. So, some smart people on the proposed establishment of a centralized method, the data from different sources into one centralized place.

        A complete centralized logging system, is inseparable from the following key features:

  • Collection - Log data can be collected from multiple sources
  • Transmission - can stably transmit the log data to the central system
  • Storage - How to store log data
  • Analysis - analysis supports UI
  • Warning - can provide error reporting, monitoring mechanisms

      Open source real-time log analysis ELK perfect platform to solve our problems described above, ELK by the ElasticSearch, Logstash and Kibana three open source tools.

      Elasticsearch : is an open source distributed search engine, its features are: distributed, zero-configuration, auto-discovery, auto-slice index, index replication mechanism, restful style interfaces, multiple data sources, such as automatic load search.

      Logstash : is a fully open source tools, he can collect, analyze your log and store it for later use.

      Kibana : is an open source and free tools, Kibana can analyze friendly Web interface and log Logstash ElasticSearch provided to help summarize, analyze and search for important data logs.

      ELK is not really a piece of software, but rather a set of solutions, is the first letter of the abbreviation of three software products, Elasticsearch, Logstash and Kibana. All three software is open source software, usually used in conjunction with, and also has attributed Elastic.co company name, it is referred to as the ELK protocol stack.

      Official website: https://www.elastic.co/cn/  , Chinese document: https://elkguide.elasticsearch.cn/

      Download elk old versions of each component: https://www.elastic.co/downloads/past-releases

     Logstash deployed on all services need to collect logs, Logstash collection AppServer generated Log, the log will be collected together to the full-text search service ElasticSearch, and Kibana query data from the ES cluster generate graphs, and then returned to the client Browser.

   2, the overall architecture design

     ELK here is a traditional architecture (below begin with the three-node cluster for example):

     Traditionally ELK architecture, do some of the architectural transformation and optimization. After transformation, the overall structure is as follows: 

    FIG architecture can be seen above, in the conventional ELK (elasticsearch + logstash + kibana), an increase of two components (Filebeat + Kafka), there is a problem with conventional ELK:

    Question 1: Logstash if collected directly logs fatal flaw is its performance and resource consumption (default heap size is 1GB), low collection efficiency logs, although its performance in recent years has improved so much, and it replacement are still slower compared to many. There are performance comparison of Logstash and Filebeat . It is in the case of a large amount of data can be a problem.

   Filebeat : Beats As a family, Filebeat is a lightweight log transfer tool, there is a positive to make up for its shortcomings Logstash: Filebeat as a lightweight log transfer tool can be pushed to the center of the log Logstash. In the version 5.x, Elasticsearch ability (as Logstash filter) parsing - Ingest. This means that data can be directly used Filebeat pushed Elasticsearch, and let Elasticsearch both resolved to do something, they do store things. It does not require the use of a buffer, because Filebeat and Logstash will remember as the last offset read;

   Question 2: Logstash does not currently support cache buffer log collection without making any direct push Elasticsearch, when short-term or large log log end surge, the risk of downtime Elasticsearch possible, the current typical alternative is to Redis or Kafka as a central pool.

    Kafka used to live: Kafka used to live was originally developed by Linkedin Corporation, is a distributed, support partitions (partition), multiple copies (replica), zookeeper coordinated distributed message-based systems, its greatest feature is the ability to handle large amounts of data in real time to meet the diverse needs of scenes: for example hadoop batch-based systems, low-latency real-time systems, storm / Spark streaming engine, web / nginx logs, access logs, messaging services, etc., written in scala language, Linkedin in 2010 in contribution to the Apache Foundation and become the top open source projects.

   Question 3: How effective collection container logs? The present methods are effective: 1, to hang on the log host; 2, Logpilot log collection container collecting middleware

    Logpilot: log-Pilot Ali container log collection program open source, to provide you with the log collection mirror. A log-pilot can deploy instances on each machine, you can collect all Docker application logs on the machine.

    Then how did you choose between Redis and Kafka?

  • Push messages reliability:

    Redis Push message (Distributed Pub / Sub) used for real-time message push higher, does not guarantee reliability. Redis-Pub / Sub power failure will clear the data, and the use Redis-List as a message, although there are persistent push, is not entirely reliable is not lost.

    Kafka although there are some delays, but to ensure that reliable.

  • Subscriptions grouping:

    Redis publish subscribe addition to that outside different topic, do not support grouping.

    Kafka published a content, multiple subscribers can be grouped, only one subscriber will receive the same message in a group, which can be used as load balancing.

  • Cluster resource consumption:

    After 3.0 ha have Redis provides cluster mechanism, but are configured for each node to one or more slave nodes from the master node pulls data from the upper node, the master node hanging, replace up from the node becomes the master node, but this Compare a waste of resources.

    Kafka as a message queue, the cluster can fully use the resources, each application corresponds to a topic, a topic may have a plurality of partition, and the partition can be assigned to each node polling above, and producers to produce a uniform data will into the partition, even if only the upper one application kafka cluster resources will be utilized to the full, thus avoiding the problem of data skew redis clusters occur, and there is a redundant mechanism similar to kafka hdfs of a broker hang It does not affect the operation of the entire cluster.

  • Throughput:

     Kafka Due to fragmentation, sparse indexing mechanism, Kafka can withstand one hundred million throughput.

     Based on the above considerations, the choice of Kafka as a data buffer.

   After architecture design, process whole logs platform are as follows:

  1. Log data to the server will be collected Filebeat deployment log, Filebeat the collected transmitted to the Kafka.
  2. Kafka the acquired log information stored, and an input (input) is transmitted to Logstash.
  3. Kafka Logstash data as input, and the data in Kafka other operations such as filtration, and then the data obtained in the input (output) to the Elasticsearch.
  4. Elasticsearch Logstash the data processed and transmitted to the data as an input for display kibana.

   The following sections will take you introduce the whole building process.

Published 41 original articles · won praise 47 · views 30000 +

Guess you like

Origin blog.csdn.net/u014526891/article/details/102837814