ELK enterprise-level log analysis system

Preface

In large-scale enterprise scenarios, the problems faced include how to archive too much log, how to do when text search is too slow, and how to query in multiple dimensions. Centralized log management is required, and log collection and summary on all servers. The common solution is to establish a centralized log collection system to collect, manage, and access logs on all nodes in a unified manner.
Therefore, log servers are established in enterprises to increase security and centralized management, but the corresponding large number of log files makes it difficult to analyze logs. The ELK introduced today is to solve this problem.

1. ELK overview

1. ELK log analysis system

ELK is a combination of three open source software , Elasticsearch , Logstash , and Kiban . In the case of real-time data retrieval and analysis, the three are usually coordinated and shared, and they all belong to the name of Elastic.co, hence this abbreviation.

2. Log processing steps in ELK

Insert picture description here

Step 1: Perform centralized management of logs (beats)
Step 2: Format the logs (Logstash), and then output the formatted data to Elasticsearch
Step 3: Index and store the formatted data ( Elasticsearch)
Step 4: Display of front-end data (Kibana)

3. Overview of Elasticsearch

Elasticsearch is a Lucene-based search server. It provides a full-text search engine with distributed multi-user capabilities based on a RESTful web interface.
Elasticsearch is developed in Java and released as an open source under the terms of the Apache license. It is the second most popular enterprise search engine. Designed for use in cloud computing, it can achieve real-time search, stable, reliable, fast and easy to install and use.

(1) Features of Elasticsearch

  • Near real-time search
  • Cluster
  • node
  • index
    • Index (library) → type (table) → document (record)
  • Shards and copies

(2) Fragmentation and copy

Among the above features, the most important ones are sharding and replicas, which are also the main reasons why es database (Elasticsearch) has become a mainstream search engine such as Baidu, which can theoretically increase the performance by 4 times.
Analyze the actual situation : the data stored in the index may exceed the hardware limit of a single node. For example, a 1 billion document requires 1TB of space may not be suitable for storage on the disk of a single node, or the search request from a single node is too slow, in order to solve this problem , Elasticsearch provides the function of dividing an index into multiple shards. When creating an index, you can define the number of shards you want. Each shard is a fully functional independent index, which can be located on any node in the cluster.

  • Features of sharding:
    • Split horizon expansion to increase storage capacity
    • Distributed and parallel cross-shard operations to provide performance and throughput

The mechanism of distributed sharding and how the documents of search requests are aggregated are completely controlled by elasticsearch, which is transparent to users.

Network problems and other problems can happen unexpectedly at any time. For robustness, it is strongly recommended to have a failover mechanism, no matter what kind of failure to prevent fragmentation or node unavailability, for this reason, elasticsearch let us divide the index One or more copies of a slice, called a slice copy or copy

  • Features of the copy:
    • High availability to deal with shards or node failures. For this reason, shard copies must be on different nodes
    • Enhanced performance, increased throughput, search can be performed on all replicas in parallel

4. Overview of LogStash

  • A powerful data processing tool
  • Data transmission, format processing, and formatted output can be realized
  • Data input, data processing (such as filtering, rewriting, etc.) and data output
  • Commonly used plug-ins: Input, Filter Plugin, Output
    • Input: Collect source data (access logs, error logs, etc.)
    • Filter Plugin: used to filter logs and format processing
    • Output: output log

5, Kibana overview

  • An open source analysis and visualization platform for Elasticsearch
  • Search and view data stored in Elasticsearch index
  • Advanced data analysis and display through various charts
  • Kibana main functions
    • Elasticsearch seamless integration
      • The Kibana architecture is customized for Elasticsearch, and any structured and unstructured data can be added to the Elasticsearch index. Kibana also takes full advantage of the powerful search and analysis capabilities of Elasticsearch.
    • Integrate data
      • Kibana can better handle massive amounts of data and create column charts, line charts, scatter charts, histograms, pie charts, and maps based on this.
    • Complex data analysis
      • Kibana has improved the analysis capabilities of Elasticsearch, able to analyze data more intelligently, perform mathematical transformations, and segment data as required.
    • Let more team members benefit
      • The powerful database visualization interface allows all business positions to benefit from the data collection.
    • Flexible interface, easier to share
      • Use Kibana to create, save, and share data more conveniently, and quickly communicate visualized data.
    • Simple configuration
      • The configuration and activation of Kibana is very simple, and the user experience is very friendly. Kibana comes with a web server, which can be quickly up and running.
    • Visualize multiple data sources
      • Kibana can easily integrate data from Logstash, ES-Hadoop, Beats or third-party technologies into Elasticsearch. The supported third-party technologies include Apache flume, Fluentd, etc.
    • Simple data export
      • Kibana can easily export the data of interest, merge it with other data collections and quickly model and analyze it, and discover new results.

2. Deploy ELK log analysis system

A total of six installation packages
related installation package
case topology
Insert picture description here

Description of Requirement

  • Configure ELK log analysis cluster
  • Use Logstash to collect logs
  • Use Kibana to view analysis logs

Case environment
Configure and install ELK log analysis system, install cluster mode, 2 elasticsearch nodes, and monitor apache server logs

Host operating system IP address software
node1 CentOS7 192.168.163.11 Elasticsearch / Kibana
node2 CentOS7 192.168.163.12 Elasticsearch
apache CentOS7 192.168.163.13 httpd / Logstash
Client (host) Windows10 192.168.163.1

Preparation for the experiment
Close firewall and system security mechanism
Change host name

systemctl stop firewalld.service
systemctl disable firewalld.service
setenforce 0

hostnamectl set-hostname 主机名

Insert picture description here

1. Configure the elasticsearch environment

node1(192.168.163.11)
node2 (192.168.163.12)

echo '192.168.163.11 node1' >> /etc/hosts
echo '192.168.163.12 node2' >> /etc/hosts

java -version    #如果没有安装,yum -y install java

Insert picture description here

2. Deploy elasticsearch software

node1(192.168.163.11)
node2 (192.168.163.12)
(1) Install the elasticsearch-rpm package and
upload elasticsearch-5.5.0.rpm to the /opt directory

cd /opt
rpm -ivh elasticsearch-5.5.0.rpm

Insert picture description here

(2) Load system service

systemctl daemon-reload
systemctl enable elasticsearch.service

Insert picture description here

(3) Change the main configuration file of elasticsearch

cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak

vim /etc/elasticsearch/elasticsearch.yml
#17行;取消注释,修改;集群名字
cluster.name: my-elk-cluster
#23行;取消注释,修改;节点名字(node2修改成node2)
node.name: node1
#33行;取消注释,修改;数据存放路径
path.data: /data/elk_data
#37行;取消注释,修改;日志存放路径
path.logs: /var/log/elasticsearch
#43行;取消注释,修改;不在启动的时候锁定内存
bootstrap.memory_lock: false
#55行;取消注释,修改;提供服务绑定的IP地址,0.0.0.0代表所有地址
network.host: 0.0.0.0
#59行;取消注释;侦听端口为9200(默认)
http.port: 9200
#68行;取消注释,修改;集群发现通过单播实现,指定要发现的节点 node1、node2
discovery.zen.ping.unicast.hosts: ["node1", "node2"]

Insert picture description here

Verify configuration

grep -v "^#" /etc/elasticsearch/elasticsearch.yml

Insert picture description here

(4) Create a data storage path and authorize

mkdir -p /data/elk_data
chown elasticsearch:elasticsearch /data/elk_data/

Insert picture description here
(5) Whether to start elasticsearch successfully

systemctl start elasticsearch
netstat -antp |grep 9200

Insert picture description here

(6) View node information
Access on the host 192.168.163.1

http://192.168.163.11:9200
http://192.168.163.12:9200

Insert picture description here

(7) Check the health status of the cluster
Access on the host 192.168.163.1

http://192.168.163.11:9200/_cluster/health?pretty
http://192.168.163.12:9200/_cluster/health?pretty

Insert picture description here

(8) View the cluster status
Access on the host 192.168.163.1

http://192.168.163.11:9200/_cluster/state?pretty
http://192.168.163.12:9200/_cluster/state?pretty

Insert picture description here

3. Install the elasticsearch-head plugin

  • Install the elasticsearch-head plugin to manage the cluster

(1) Compile and install the node component dependency package
node1(192.168.163.11)
node2 (192.168.163.12)

yum -y install gcc gcc-c++ make

上传软件包 node-v8.2.1.tar.gz 到/opt
cd /opt
tar xzvf node-v8.2.1.tar.gz
cd node-v8.2.1/
./configure && make && make install
这里耗时比较长估计20-30分钟

Insert picture description here

(2) Install phantomjs (front-end framework)
node1(192.168.163.11)
node2 (192.168.163.12)

上传软件包 phantomjs-2.1.1-linux-x86_64.tar.bz2 到/opt目录下
cd /opt
tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/src/
cd /usr/local/src/phantomjs-2.1.1-linux-x86_64/bin
cp phantomjs /usr/local/bin

Insert picture description here

(3) Install elasticsearch-head (data visualization tool)
node1(192.168.163.11)
node2 (192.168.163.12)

上传软件包 elasticsearch-head.tar.gz 到/opt
cd /opt
tar zxvf elasticsearch-head.tar.gz -C /usr/local/src/
cd /usr/local/src/elasticsearch-head/
npm install

Insert picture description here

(4) Modify the main configuration file
node1(192.168.163.11)
node2 (192.168.163.12)

vim /etc/elasticsearch/elasticsearch.yml
......
#-------末尾;添加以下内容--------
http.cors.enabled: true
http.cors.allow-origin: "*"

#-----------参数解释-----------------------------
http.cors.enabled: true				#开启跨域访问支持,默认为 false
http.cors.allow-origin: "*"			#指定跨域访问允许的域名地址为所有


systemctl restart elasticsearch.service

Insert picture description here

(5) Start elasticsearch-head
node1(192.168.163.11)
node2 (192.168.163.12)

必须在解压后的 elasticsearch-head 目录下启动服务,进程会读取该目录下的 gruntfile.js 文件,否则可能启动失败。
cd /usr/local/src/elasticsearch-head/
npm run start &

> [email protected] start /usr/local/src/elasticsearch-head
> grunt server

Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100

elasticsearch-head 监听的端口是 9100
netstat -natp |grep 9100

Insert picture description here

(6) Use the elasticsearch-head plugin to view the cluster status
Access on the host 192.168.163.1

http://192.168.163.11:9100
在Elasticsearch 后面的栏目中输入
http://192.168.163.11:9200

http://192.168.163.12:9100
在Elasticsearch 后面的栏目中输入
http://192.168.163.12:9200

Insert picture description here

(7) Create an index
node1(192.168.163.11)
Create index as index-demo, type as test

curl -XPUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}'

Insert picture description here

Back to the host 192.168.163.1
Open the browser and enter the address to view the index information

http://192.168.163.11:9100

In the figure below, you can see that the index is divided into 5 by default, and there is one copy.
Insert picture description here
Click Data Browse-you will find that the index created on node1 is index-demo, the type is test, and related information
Insert picture description here

4. Install logstash

Collect logs and output to elasticsearch

(1) Install Apahce service (httpd)
apache(192.168.163.13)

yum -y install httpd
systemctl start httpd

Insert picture description here

(2) Install Java environment
apache(192.168.163.13)

java -version        ###如果没有装 安装yum -y install java

Insert picture description here

(3) Install logstash
apache(192.168.163.13)

上传logstash-5.5.1.rpm到/opt目录下
cd /opt
rpm -ivh logstash-5.5.1.rpm

systemctl start logstash.service
systemctl enable logstash.service

#建立logstash软连接
ln -s /usr/share/logstash/bin/logstash /usr/local/bin/

Insert picture description here

(4) Test the logstash command
apache(192.168.163.13)

字段描述解释:
-f  通过这个选项可以指定logstash的配置文件,根据配置文件配置logstash
-e  后面跟着字符串 该字符串可以被当做logstash的配置(如果是” ”,则默认使用stdin做为输入、stdout作为输出)
-t  测试配置文件是否正确,然后退出

Define input and output streams:
standard input is used for input, standard output is used for output (similar to a pipeline)

logstash -e 'input { stdin{} } output { stdout{} }'

Insert picture description here
Use rubydebug to display detailed output, codec is a codec

logstash -e 'input { stdin{} } output { stdout{ codec=>rubydebug } }'

Insert picture description here

Use Logstash to write information into Elasticsearch

logstash -e 'input { stdin{} } output { elasticsearch { hosts=>["192.168.163.11:9200"] } }'

Insert picture description here

Access on the host 192.168.163.1
View index information

多出 logstash-日期
http://192.168.163.11:9100

Insert picture description here
Click the data browse to view the content of the response
Insert picture description here

(5) Do the docking configuration on the Apache host
apache(192.168.163.13)

The Logstash configuration file is mainly composed of three parts: input, output, and filter (as required)

chmod o+r /var/log/messages
ll /var/log/messages

vim /etc/logstash/conf.d/system.conf
input {
    
    
       file{
    
    
        path => "/var/log/messages"
        type => "system"
        start_position => "beginning"
        }
      }
output {
    
    
        elasticsearch {
    
    
          hosts => ["192.168.163.11:9200"]
          index => "system-%{+YYYY.MM.dd}"
          }
        }


systemctl restart logstash.service

Insert picture description here

Access on the host 192.168.163.1
View index information

多出 system-日期
http://192.168.163.11:9100

Insert picture description here
Insert picture description here

5, cheap kibana

node1(192.168.163.11)

上传kibana-5.5.1-x86_64.rpm 到/opt目录
cd /opt
rpm -ivh kibana-5.5.1-x86_64.rpm

cd /etc/kibana/
cp kibana.yml kibana.yml.bak

vim kibana.yml
#2行;取消注释;kibana打开的端口(默认5601)
server.port: 5601
#7行;取消注释,修改;kibana侦听的地址
server.host: "0.0.0.0"
#21行;取消注释,修改;和elasticsearch建立联系
elasticsearch.url: "http://192.168.163.11:9200"
#30行;取消注释;在elasticsearch中添加.kibana索引
kibana.index: ".kibana"              				

systemctl start kibana.service 
systemctl enable kibana.service

Insert picture description here

Access on the host 192.168.163.1

192.168.163.11:5601

Create an index name when logging in for the first time: system-* (this is the docking system log file)
and then click the create button at the bottom to create
Insert picture description here

Then click the Discover button in the upper left corner and you will find the system-* information.
Insert picture description here
Then click add next to the host below, and you will find that the picture on the right has only the Time and host options. This is more friendly
Insert picture description here

(6) Apache log files (access log, error log) connected to the Apache host
apache(192.168.163.13)

cd /etc/logstash/conf.d/

vim apache_log.conf
input {
    
    
     file{
    
    
        path => "/etc/httpd/logs/access_log"
        type => "access"
        start_position => "beginning"
      }
     file{
    
    
        path => "/etc/httpd/logs/error_log"
        type => "error"
        start_position => "beginning"
      }
}
output {
    
    
    if [type] == "access" {
    
    
        elasticsearch {
    
    
          hosts => ["192.168.163.11:9200"]
          index => "apache_access-%{+YYYY.MM.dd}"
        }
    }
    if [type] == "error" {
    
    
        elasticsearch {
    
    
          hosts => ["192.168.163.11:9200"]
          index => "apache_error-%{+YYYY.MM.dd}"
        }
    }
}

/usr/share/logstash/bin/logstash -f apache_log.conf

Insert picture description here

Access on the host 192.168.163.1
Open and enter http://192.168.163.13, manufacturing site access record.
Insert picture description here
Open the browser and enter http://192.168.163.11:9100/ to view the index information. You
can find apache_error-2021.03.04 and apache_access-2021.03.04
Insert picture description here

Open the browser and enter http://192.168.163.11:5601 and
click on the management option in the lower left corner—index patterns—create index pattern
to create indexes for apache_error-* and apache_access-* respectively
Insert picture description here
Insert picture description here

to sum up

There are still imperfections in this architecture, so we can continue to optimize and expand the architecture, such as expanding it to the efk architecture.
efk architecture is elasticsearch + logstash + filebeat + kafka + kibana + redis configuration, wherein elasticsearch for indexing and storing data; logstash for format conversion; filebeat (lightweight document collection tool) for log collection; Kafka ( Message queue can process hundreds of thousands of concurrent data per second) + redis (caching service) is used to resist high concurrency; kibana is used to display front-end data.
I won't talk about the specifics, and those who are interested can take a look at it.

Guess you like

Origin blog.csdn.net/weixin_51326240/article/details/114572036