ELK log analysis system (theory + build)

1. Introduction to ELK log analysis system

1.1 Overview of ELK log analysis system

The ELK log analysis system is a collection of Logstash, Elasticsearch, and Kibana open source software. It is an open source solution for a log management system. It can search, analyze and visualize logs from any source and in any format.
Insert picture description here

1.2. Log processing steps

1. Centralized management of logs

2. Format the log (Logstash) and output to Elasticsearch

3. Index and store the formatted data (Elasticsearch)

4. Display of front-end data (Kibana)

1.3 Introduction to ElasticSearch

Elasticsearch is developed in Java and provides a full-text search engine with distributed multi-user capabilities. It is designed for cloud computing and can achieve real-time search, stable, reliable, fast, and easy to install and use.

The basic core concepts of Elasticsearch:

(1) Near real time (NRT)
elasticsearch is a near real time search platform, which means that there is a slight delay from indexing a document until the document can be searched (usually 1 second)

(2) A cluster (cluster) The
cluster contains multiple nodes. Which cluster each node belongs to is determined by a configuration (cluster name, elasticsearch by default). For small and medium-sized applications, there is only one node at the beginning of a cluster. normal,

(3) Node (node)
A node in the cluster, the node also has a name (default is randomly assigned), the node name is very important (when performing operation and maintenance management operations), the default node will add a name named "elasticsearch If you start a bunch of nodes directly, they will automatically form an elasticsearch cluster. Of course, a node can also form an elasticsearch cluster.

(4) Index (index)
An index is a collection of documents with somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and an index for order data. An index is identified by a name (must be all lowercase letters), and when we want to index, search, update, and delete documents corresponding to this index, we must use this name. In a cluster, you can define as many indexes as you want.
Indexes are relative to relational database libraries.

(5) Index (index)
ElasticSearch stores its data in one or more indexes (index). Analogy in terms of the SQL field, an index is like a database. You can write documents to the index or read documents from the index, and use Lucene to write data to the index or retrieve data from the index through ElasticSearch.

(6) Document (document)
Document (document) is the main entity in ElasticSearch. For all cases that use ElasticSearch, they can ultimately be attributed to the search for documents. The document consists of fields.

(7) Shards
represent index shards . es can divide a complete index into multiple shards. The advantage of this is that a large index can be split into multiple and distributed on different nodes. Form a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created. 5.X default fragmentation cannot be defined by configuration file

(8) Replicas
represent index replicas. es can set multiple index replicas. The first function of replicas is to improve the fault tolerance of the system. When a certain node is damaged or lost, it can be recovered from the replica. The second is to improve the query efficiency of es, and es will automatically load balance search requests.

1.4 Introduction to Logstash

Logstash is a completely open source tool that can collect, filter, format, and output your logs, and store them for later use (eg, search).

Logstash is written in JRuby language, based on a simple message-based architecture, and runs on the Java Virtual Machine (JVM). Unlike a separate agent or server, LogStash can be configured with a single agent combined with other open source software to achieve different functions.

The philosophy of Logstash is very simple, it only does 3 things:

Collect: Data input
Enrich: Data processing, such as filtering, rewriting, etc.
Transport: Data output

(1) The main components of LogStash:

Shipper: Log collector. Responsible for monitoring the changes of local log files and collecting the latest content of log files in time. Usually, the remote agent (agent) only needs to run this component;

Indexer: The log store. Responsible for receiving logs and writing to local files.

Broker: Log Hub. Responsible for connecting multiple Shippers and multiple Indexers

Search and Storage: allows searching and storing events;

Web Interface: Web-based display interface

It is precisely because the above components can be deployed independently in the LogStash architecture, it provides better cluster scalability

(2) LogStash host classification:

Agent host: As the shipper of the event, it sends various log data to the central host; only needs to run the Logstash agent program;

Central host: It can run various components including Broker, Indexer, Search and Storage, and Web Interface to realize log data Reception, processing and storage

1.5 Introduction to Kiabana

Kibana is an open source analysis and visualization platform for Elasticsearch

Search and view interactive data stored in Elasticsearch index

Advanced data analysis and display can be performed through various charts

Main functions of Kiabana:

Elasticsearch seamless integration

Integrated data, complex data analysis

Benefit more team members

Flexible interface, easier to share

Simple configuration, visualized multiple data sources

Simple data export

2. Experimental system construction

2.1, experimental environment

VMware virtual machine
A centos7.4 virtual machine, IP address: 20.0.0.21, host name: node1, need to serve Elasticsearch

A centos7.4 virtual machine, IP address: 20.0.0.22, host name: node2, need to serve Elasticsearch, Kibana

A centos7.4 virtual machine, IP address: 20.0.0.23, host name: apache, it needs to serve Logstash, Apache

Firewall, core protection is off

2.2, configure the Elasticsearch environment

(1) Log in to 20.0.0.21, change the host name, configure domain name resolution, and check the Java environment (if not available)

hostnamectl set-hostname node1

vi /etc/hosts
20.0.0.21 node1
20.0.0.22 node2

java -version

(2) Install Elasticsearch and
drag the prepared package in to decompress

 rpm -ivh elasticsearch-5.5.0.rpm 

 systemctl daemon-reload 

 systemctl enable elasticsearch.service 

(3) Modify the configuration file

cp -p /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak

vim /etc/elasticsearch/elasticsearch.yml
17 cluster.name: my-elk-cluster
23 node.name: node1
33 path.data: /data/elk_data
37 path.logs: /var/log/elasticsearch/
43 bootstrap.memory_lock: false
55 network.host: 0.0.0.0
59 http.port: 9200
68 discovery.zen.ping.unicast.hosts: ["node1", "node2"]

(4) Create data storage path and authorize

 mkdir -p /data/elk_data/
 chown elasticsearch:elasticsearch /data/elk_data/

(5) Start Elasticsearch

systemctl start elasticsearch.service
netstat -antp | grep 9200

(6) Open the real machine browser at 20.0.0.21:9200
Insert picture description here
(7) Configure node2:20.0.0.22 The Elasticsearch environment is almost the same as the above. The
host name is set to node2 and the
modification configuration file is 23 node.name: node2
and then the real machine browser opens 20.0.0.22:9200
Insert picture description here
(8) Cluster health check and status
. Enter 20.0.0.21:9200/_cluster/health?pretty in the browser of the real machine (you can also change to 20.0.0.22:9200/_cluster/health?pretty)
Insert picture description here

2.3, node1 and node2 install elasticsearch-head plugin

After the above installation is successful, the viewing is very unintuitive, so we install the elasticsearch-head plug-in to view the cluster situation more intuitively and facilitate management. The following only shows the operation of node1
(1) Compile and install the node component dependency package
Upload the package to home

   yum -y install gcc-c++ make gcc
   
   tar zxvf node-v8.2.1.tar.gz 
   
   cd node-v8.2.1/
   
    ./configure 
  
   make -j3       编译时间较长约半个小时
   
   make install

(2) Install phantomjs (front-end framework) and
upload the package to /usr/local/src/

  cd /usr/local/src/
  
  tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 
 
  cd phantomjs-2.1.1-linux-x86_64//bin

  cp phantomjs /usr/local/bin

(3) Install elasticsearch-head (data visualization tool) and
upload the package to /usr/local/src/

cd /usr/local/src/

tar zxvf elasticsearch-head.tar.gz

cd elasticsearch-head/

npm install

(4) Modify the main configuration file

vim /etc/elasticsearch/elasticsearch.yml
http.cors.enabled: true
http.cors.allow-origin: "*"

systemctl restart elasticsearch.service

(5) Start elasticsearch-head

cd /usr/local/src/

cd elasticsearch-head/

npm run start &

netstat -lnupt |grep 9100

插入索引
curl -XPUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}'

(6) Enter 20.0.0.21:9100 in the browser of the real machine
Insert picture description here

2.4, Apache server deployment logstash

(1) Opening optimization

hostnamectl set-hostname apache

yum -y install httpd

systemctl start httpd

java -version      没有java环境需安装

(2) Install logstash
upload package to /pot

 rpm -ivh logstash-5.5.1.rpm

systemctl start logstash.service

systemctl enable logstash.service

ln -s /usr/share/logstash/bin/logstash /usr/local/bin/

(3) Do docking test with
elasticsearch (node) Logstash command test
Field description explanation:
● -f This option allows you to specify the configuration file of logstash, configure logstash according to the configuration file
● -e followed by a string, the string can be regarded as The configuration of logstash (if it is "", stdin is used as input and stdout as output by default)
● -t test whether the configuration file is correct, and then exit

The input adopts standard input and output adopts standard output for testing

[root@apache opt]# logstash -e 'input { stdin{} } output { stdout{} }'

The stdin plugin is now waiting for input:
11:33:17.455 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {
    
    :port=>9600}
www.baidu.com                 输入网址
2020-10-29T03:33:26.424Z apache www.baidu.com

Test: Use rubydebug to display detailed output, codec is a codec

[root@apache opt]# logstash -e 'input { stdin{} } output { stdout{ codec=>rubydebug } }'

The stdin plugin is now waiting for input:
11:30:32.066 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {
    
    :port=>9600}
www.baidu.com             输入网址
{
    
    
    "@timestamp" => 2020-10-29T03:30:59.822Z,
      "@version" => "1",
          "host" => "apache",
       "message" => "www.baidu.com"

Use logstash to write information into elasticsearch

[root@apache opt]# logstash -e 'input { stdin{} } output { elasticsearch { hosts=>["20.0.0.21:9200"] } }'

The stdin plugin is now waiting for input:
11:43:46.129 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {
    
    :port=>9600}
www.baidu.com                      输入网址

(4) The real machine browser visits the node1 node to view the index information
Insert picture description here
(5) Docking configuration The
Logstash configuration file is mainly composed of three parts: input, output, and filter (processed according to needs)

chmod o+r /var/log/messages

[root@apache opt]# vi /etc/logstash/conf.d/system.conf

input {
    
    
    file{
    
    
    path => "/var/log/messages"
    type => "system"
    start_position => "beginning"
    }
   }
output {
    
    
    elasticsearch {
    
    
    hosts => ["20.0.0.21:9200"]
    index => "system-%{+YYYY.MM.dd}"
    }
   }

[root@apache opt]# systemctl restart logstash.service

Real machine view
Insert picture description here

2.5, install kibana on node2 host

Upload the package to /usr/local/src/

cd /usr/local/src/

rpm -ivh kibana-5.5.1-x86_64.rpm

cd /etc/kibana/

cp -p kibana.yml kibana.yml.bak

[root@node2 kibana]# vi kibana.yml
2 server.port: 5601
7 server.host: "0.0.0.0"
21 elasticsearch.url: "http://20.0.0.21:9200"
30 kibana.index: ".kibana"

[root@node2 kibana]# systemctl start kibana.service 
[root@node2 kibana]# systemctl enable kibana.service 

Real machine view
Insert picture description here
Insert picture description here
Insert picture description here

2.6. Apache log files connected to the Apache host

cd /etc/logstash/conf.d/

touch apache_log.conf


[root@apache conf.d]# vi apache_log.conf 

input {
    
    
    file{
    
    
    path => "/etc/httpd/logs/access_log"
    type => "access"
    start_position => "beginning"
    }
    file{
    
    
    path => "/etc/httpd/logs/error_log"
    type => "error"
    start_position => "beginning"
    }
   }
output {
    
    
    if [type] == "access" {
    
    
    elasticsearch {
    
    
    hosts => ["20.0.0.21:9200"]
    index => "apache_access-%{+YYYY.MM.dd}"
    }
   }
    if [type] == "error" {
    
    
    elasticsearch {
    
    
    hosts => ["20.0.0.21:9200"]
    index => "apache_error-%{+YYYY.MM.dd}"
    }
   }
   }
[root@apache conf.d]# /usr/share/logstash/bin/logstash -f apache_log.conf

First visit the apache website 20.0.0.23 on the real machine to generate logs
Insert picture description here

Then visit http://20.0.0.21:9100 to find two index information,
Insert picture description here
you can enter the kibana interface to create an index, the
Insert picture description here
Insert picture description here
Insert picture description here
platform is successfully built

Guess you like

Origin blog.csdn.net/weixin_48191211/article/details/109356540