1. Introduction to ELK log analysis system
1.1 Overview of ELK log analysis system
The ELK log analysis system is a collection of Logstash, Elasticsearch, and Kibana open source software. It is an open source solution for a log management system. It can search, analyze and visualize logs from any source and in any format.
1.2. Log processing steps
1. Centralized management of logs
2. Format the log (Logstash) and output to Elasticsearch
3. Index and store the formatted data (Elasticsearch)
4. Display of front-end data (Kibana)
1.3 Introduction to ElasticSearch
Elasticsearch is developed in Java and provides a full-text search engine with distributed multi-user capabilities. It is designed for cloud computing and can achieve real-time search, stable, reliable, fast, and easy to install and use.
The basic core concepts of Elasticsearch:
(1) Near real time (NRT)
elasticsearch is a near real time search platform, which means that there is a slight delay from indexing a document until the document can be searched (usually 1 second)
(2) A cluster (cluster) The
cluster contains multiple nodes. Which cluster each node belongs to is determined by a configuration (cluster name, elasticsearch by default). For small and medium-sized applications, there is only one node at the beginning of a cluster. normal,
(3) Node (node)
A node in the cluster, the node also has a name (default is randomly assigned), the node name is very important (when performing operation and maintenance management operations), the default node will add a name named "elasticsearch If you start a bunch of nodes directly, they will automatically form an elasticsearch cluster. Of course, a node can also form an elasticsearch cluster.
(4) Index (index)
An index is a collection of documents with somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and an index for order data. An index is identified by a name (must be all lowercase letters), and when we want to index, search, update, and delete documents corresponding to this index, we must use this name. In a cluster, you can define as many indexes as you want.
Indexes are relative to relational database libraries.
(5) Index (index)
ElasticSearch stores its data in one or more indexes (index). Analogy in terms of the SQL field, an index is like a database. You can write documents to the index or read documents from the index, and use Lucene to write data to the index or retrieve data from the index through ElasticSearch.
(6) Document (document)
Document (document) is the main entity in ElasticSearch. For all cases that use ElasticSearch, they can ultimately be attributed to the search for documents. The document consists of fields.
(7) Shards
represent index shards . es can divide a complete index into multiple shards. The advantage of this is that a large index can be split into multiple and distributed on different nodes. Form a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created. 5.X default fragmentation cannot be defined by configuration file
(8) Replicas
represent index replicas. es can set multiple index replicas. The first function of replicas is to improve the fault tolerance of the system. When a certain node is damaged or lost, it can be recovered from the replica. The second is to improve the query efficiency of es, and es will automatically load balance search requests.
1.4 Introduction to Logstash
Logstash is a completely open source tool that can collect, filter, format, and output your logs, and store them for later use (eg, search).
Logstash is written in JRuby language, based on a simple message-based architecture, and runs on the Java Virtual Machine (JVM). Unlike a separate agent or server, LogStash can be configured with a single agent combined with other open source software to achieve different functions.
The philosophy of Logstash is very simple, it only does 3 things:
Collect: Data input
Enrich: Data processing, such as filtering, rewriting, etc.
Transport: Data output
(1) The main components of LogStash:
Shipper: Log collector. Responsible for monitoring the changes of local log files and collecting the latest content of log files in time. Usually, the remote agent (agent) only needs to run this component;
Indexer: The log store. Responsible for receiving logs and writing to local files.
Broker: Log Hub. Responsible for connecting multiple Shippers and multiple Indexers
Search and Storage: allows searching and storing events;
Web Interface: Web-based display interface
It is precisely because the above components can be deployed independently in the LogStash architecture, it provides better cluster scalability
(2) LogStash host classification:
Agent host: As the shipper of the event, it sends various log data to the central host; only needs to run the Logstash agent program;
Central host: It can run various components including Broker, Indexer, Search and Storage, and Web Interface to realize log data Reception, processing and storage
1.5 Introduction to Kiabana
Kibana is an open source analysis and visualization platform for Elasticsearch
Search and view interactive data stored in Elasticsearch index
Advanced data analysis and display can be performed through various charts
Main functions of Kiabana:
Elasticsearch seamless integration
Integrated data, complex data analysis
Benefit more team members
Flexible interface, easier to share
Simple configuration, visualized multiple data sources
Simple data export
2. Experimental system construction
2.1, experimental environment
VMware virtual machine
A centos7.4 virtual machine, IP address: 20.0.0.21, host name: node1, need to serve Elasticsearch
A centos7.4 virtual machine, IP address: 20.0.0.22, host name: node2, need to serve Elasticsearch, Kibana
A centos7.4 virtual machine, IP address: 20.0.0.23, host name: apache, it needs to serve Logstash, Apache
Firewall, core protection is off
2.2, configure the Elasticsearch environment
(1) Log in to 20.0.0.21, change the host name, configure domain name resolution, and check the Java environment (if not available)
hostnamectl set-hostname node1
vi /etc/hosts
20.0.0.21 node1
20.0.0.22 node2
java -version
(2) Install Elasticsearch and
drag the prepared package in to decompress
rpm -ivh elasticsearch-5.5.0.rpm
systemctl daemon-reload
systemctl enable elasticsearch.service
(3) Modify the configuration file
cp -p /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak
vim /etc/elasticsearch/elasticsearch.yml
17 cluster.name: my-elk-cluster
23 node.name: node1
33 path.data: /data/elk_data
37 path.logs: /var/log/elasticsearch/
43 bootstrap.memory_lock: false
55 network.host: 0.0.0.0
59 http.port: 9200
68 discovery.zen.ping.unicast.hosts: ["node1", "node2"]
(4) Create data storage path and authorize
mkdir -p /data/elk_data/
chown elasticsearch:elasticsearch /data/elk_data/
(5) Start Elasticsearch
systemctl start elasticsearch.service
netstat -antp | grep 9200
(6) Open the real machine browser at 20.0.0.21:9200
(7) Configure node2:20.0.0.22 The Elasticsearch environment is almost the same as the above. The
host name is set to node2 and the
modification configuration file is 23 node.name: node2
and then the real machine browser opens 20.0.0.22:9200
(8) Cluster health check and status
. Enter 20.0.0.21:9200/_cluster/health?pretty in the browser of the real machine (you can also change to 20.0.0.22:9200/_cluster/health?pretty)
2.3, node1 and node2 install elasticsearch-head plugin
After the above installation is successful, the viewing is very unintuitive, so we install the elasticsearch-head plug-in to view the cluster situation more intuitively and facilitate management. The following only shows the operation of node1
(1) Compile and install the node component dependency package
Upload the package to home
yum -y install gcc-c++ make gcc
tar zxvf node-v8.2.1.tar.gz
cd node-v8.2.1/
./configure
make -j3 编译时间较长约半个小时
make install
(2) Install phantomjs (front-end framework) and
upload the package to /usr/local/src/
cd /usr/local/src/
tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2
cd phantomjs-2.1.1-linux-x86_64//bin
cp phantomjs /usr/local/bin
(3) Install elasticsearch-head (data visualization tool) and
upload the package to /usr/local/src/
cd /usr/local/src/
tar zxvf elasticsearch-head.tar.gz
cd elasticsearch-head/
npm install
(4) Modify the main configuration file
vim /etc/elasticsearch/elasticsearch.yml
http.cors.enabled: true
http.cors.allow-origin: "*"
systemctl restart elasticsearch.service
(5) Start elasticsearch-head
cd /usr/local/src/
cd elasticsearch-head/
npm run start &
netstat -lnupt |grep 9100
插入索引
curl -XPUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}'
(6) Enter 20.0.0.21:9100 in the browser of the real machine
2.4, Apache server deployment logstash
(1) Opening optimization
hostnamectl set-hostname apache
yum -y install httpd
systemctl start httpd
java -version 没有java环境需安装
(2) Install logstash
upload package to /pot
rpm -ivh logstash-5.5.1.rpm
systemctl start logstash.service
systemctl enable logstash.service
ln -s /usr/share/logstash/bin/logstash /usr/local/bin/
(3) Do docking test with
elasticsearch (node) Logstash command test
Field description explanation:
● -f This option allows you to specify the configuration file of logstash, configure logstash according to the configuration file
● -e followed by a string, the string can be regarded as The configuration of logstash (if it is "", stdin is used as input and stdout as output by default)
● -t test whether the configuration file is correct, and then exit
The input adopts standard input and output adopts standard output for testing
[root@apache opt]# logstash -e 'input { stdin{} } output { stdout{} }'
The stdin plugin is now waiting for input:
11:33:17.455 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {
:port=>9600}
www.baidu.com 输入网址
2020-10-29T03:33:26.424Z apache www.baidu.com
Test: Use rubydebug to display detailed output, codec is a codec
[root@apache opt]# logstash -e 'input { stdin{} } output { stdout{ codec=>rubydebug } }'
The stdin plugin is now waiting for input:
11:30:32.066 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {
:port=>9600}
www.baidu.com 输入网址
{
"@timestamp" => 2020-10-29T03:30:59.822Z,
"@version" => "1",
"host" => "apache",
"message" => "www.baidu.com"
Use logstash to write information into elasticsearch
[root@apache opt]# logstash -e 'input { stdin{} } output { elasticsearch { hosts=>["20.0.0.21:9200"] } }'
The stdin plugin is now waiting for input:
11:43:46.129 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {
:port=>9600}
www.baidu.com 输入网址
(4) The real machine browser visits the node1 node to view the index information
(5) Docking configuration The
Logstash configuration file is mainly composed of three parts: input, output, and filter (processed according to needs)
chmod o+r /var/log/messages
[root@apache opt]# vi /etc/logstash/conf.d/system.conf
input {
file{
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
}
output {
elasticsearch {
hosts => ["20.0.0.21:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
[root@apache opt]# systemctl restart logstash.service
Real machine view
2.5, install kibana on node2 host
Upload the package to /usr/local/src/
cd /usr/local/src/
rpm -ivh kibana-5.5.1-x86_64.rpm
cd /etc/kibana/
cp -p kibana.yml kibana.yml.bak
[root@node2 kibana]# vi kibana.yml
2 server.port: 5601
7 server.host: "0.0.0.0"
21 elasticsearch.url: "http://20.0.0.21:9200"
30 kibana.index: ".kibana"
[root@node2 kibana]# systemctl start kibana.service
[root@node2 kibana]# systemctl enable kibana.service
Real machine view
2.6. Apache log files connected to the Apache host
cd /etc/logstash/conf.d/
touch apache_log.conf
[root@apache conf.d]# vi apache_log.conf
input {
file{
path => "/etc/httpd/logs/access_log"
type => "access"
start_position => "beginning"
}
file{
path => "/etc/httpd/logs/error_log"
type => "error"
start_position => "beginning"
}
}
output {
if [type] == "access" {
elasticsearch {
hosts => ["20.0.0.21:9200"]
index => "apache_access-%{+YYYY.MM.dd}"
}
}
if [type] == "error" {
elasticsearch {
hosts => ["20.0.0.21:9200"]
index => "apache_error-%{+YYYY.MM.dd}"
}
}
}
[root@apache conf.d]# /usr/share/logstash/bin/logstash -f apache_log.conf
First visit the apache website 20.0.0.23 on the real machine to generate logs
Then visit http://20.0.0.21:9100 to find two index information,
you can enter the kibana interface to create an index, the
platform is successfully built