table of Contents
Preface
In large-scale enterprise scenarios, the problems faced include how to archive too much log, how to do when text search is too slow, and how to query in multiple dimensions. Centralized log management is required, and log collection and summary on all servers. The common solution is to establish a centralized log collection system to collect, manage, and access logs on all nodes in a unified manner.
Therefore, log servers are established in enterprises to increase security and centralized management, but the corresponding large number of log files makes it difficult to analyze logs. The ELK introduced today is to solve this problem.
1. ELK overview
1. ELK log analysis system
ELK is a combination of three open source software , Elasticsearch , Logstash , and Kiban . In the case of real-time data retrieval and analysis, the three are usually coordinated and shared, and they all belong to the name of Elastic.co, hence this abbreviation.
2. Log processing steps in ELK
Step 1: Perform centralized management of logs (beats)
Step 2: Format the logs (Logstash), and then output the formatted data to Elasticsearch
Step 3: Index and store the formatted data ( Elasticsearch)
Step 4: Display of front-end data (Kibana)
3. Overview of Elasticsearch
Elasticsearch is a Lucene-based search server. It provides a full-text search engine with distributed multi-user capabilities based on a RESTful web interface.
Elasticsearch is developed in Java and released as an open source under the terms of the Apache license. It is the second most popular enterprise search engine. Designed for use in cloud computing, it can achieve real-time search, stable, reliable, fast and easy to install and use.
(1) Features of Elasticsearch
- Near real-time search
- Cluster
- node
- index
- Index (library) → type (table) → document (record)
- Shards and copies
(2) Fragmentation and copy
Among the above features, the most important ones are sharding and replicas, which are also the main reasons why es database (Elasticsearch) has become a mainstream search engine such as Baidu, which can theoretically increase the performance by 4 times.
Analyze the actual situation : the data stored in the index may exceed the hardware limit of a single node. For example, a 1 billion document requires 1TB of space may not be suitable for storage on the disk of a single node, or the search request from a single node is too slow, in order to solve this problem , Elasticsearch provides the function of dividing an index into multiple shards. When creating an index, you can define the number of shards you want. Each shard is a fully functional independent index, which can be located on any node in the cluster.
- Features of sharding:
- Split horizon expansion to increase storage capacity
- Distributed and parallel cross-shard operations to provide performance and throughput
The mechanism of distributed sharding and how the documents of search requests are aggregated are completely controlled by elasticsearch, which is transparent to users.
Network problems and other problems can happen unexpectedly at any time. For robustness, it is strongly recommended to have a failover mechanism, no matter what kind of failure to prevent fragmentation or node unavailability, for this reason, elasticsearch let us divide the index One or more copies of a slice, called a slice copy or copy
- Features of the copy:
- High availability to deal with shards or node failures. For this reason, shard copies must be on different nodes
- Enhanced performance, increased throughput, search can be performed on all replicas in parallel
4. Overview of LogStash
- A powerful data processing tool
- Data transmission, format processing, and formatted output can be realized
- Data input, data processing (such as filtering, rewriting, etc.) and data output
- Commonly used plug-ins: Input, Filter Plugin, Output
- Input: Collect source data (access logs, error logs, etc.)
- Filter Plugin: used to filter logs and format processing
- Output: output log
5, Kibana overview
- An open source analysis and visualization platform for Elasticsearch
- Search and view data stored in Elasticsearch index
- Advanced data analysis and display through various charts
- Kibana main functions
- Elasticsearch seamless integration
- The Kibana architecture is customized for Elasticsearch, and any structured and unstructured data can be added to the Elasticsearch index. Kibana also takes full advantage of the powerful search and analysis capabilities of Elasticsearch.
- Integrate data
- Kibana can better handle massive amounts of data and create column charts, line charts, scatter charts, histograms, pie charts, and maps based on this.
- Complex data analysis
- Kibana has improved the analysis capabilities of Elasticsearch, able to analyze data more intelligently, perform mathematical transformations, and segment data as required.
- Let more team members benefit
- The powerful database visualization interface allows all business positions to benefit from the data collection.
- Flexible interface, easier to share
- Use Kibana to create, save, and share data more conveniently, and quickly communicate visualized data.
- Simple configuration
- The configuration and activation of Kibana is very simple, and the user experience is very friendly. Kibana comes with a web server, which can be quickly up and running.
- Visualize multiple data sources
- Kibana can easily integrate data from Logstash, ES-Hadoop, Beats or third-party technologies into Elasticsearch. The supported third-party technologies include Apache flume, Fluentd, etc.
- Simple data export
- Kibana can easily export the data of interest, merge it with other data collections and quickly model and analyze it, and discover new results.
- Elasticsearch seamless integration
2. Deploy ELK log analysis system
A total of six installation packages
related installation package
case topology
Description of Requirement
- Configure ELK log analysis cluster
- Use Logstash to collect logs
- Use Kibana to view analysis logs
Case environment
Configure and install ELK log analysis system, install cluster mode, 2 elasticsearch nodes, and monitor apache server logs
Host | operating system | IP address | software |
---|---|---|---|
node1 | CentOS7 | 192.168.163.11 | Elasticsearch / Kibana |
node2 | CentOS7 | 192.168.163.12 | Elasticsearch |
apache | CentOS7 | 192.168.163.13 | httpd / Logstash |
Client (host) | Windows10 | 192.168.163.1 |
Preparation for the experiment
Close firewall and system security mechanism
Change host name
systemctl stop firewalld.service
systemctl disable firewalld.service
setenforce 0
hostnamectl set-hostname 主机名
1. Configure the elasticsearch environment
node1(192.168.163.11)
node2 (192.168.163.12)
echo '192.168.163.11 node1' >> /etc/hosts
echo '192.168.163.12 node2' >> /etc/hosts
java -version #如果没有安装,yum -y install java
2. Deploy elasticsearch software
node1(192.168.163.11)
node2 (192.168.163.12)
(1) Install the elasticsearch-rpm package and
upload elasticsearch-5.5.0.rpm to the /opt directory
cd /opt
rpm -ivh elasticsearch-5.5.0.rpm
(2) Load system service
systemctl daemon-reload
systemctl enable elasticsearch.service
(3) Change the main configuration file of elasticsearch
cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak
vim /etc/elasticsearch/elasticsearch.yml
#17行;取消注释,修改;集群名字
cluster.name: my-elk-cluster
#23行;取消注释,修改;节点名字(node2修改成node2)
node.name: node1
#33行;取消注释,修改;数据存放路径
path.data: /data/elk_data
#37行;取消注释,修改;日志存放路径
path.logs: /var/log/elasticsearch
#43行;取消注释,修改;不在启动的时候锁定内存
bootstrap.memory_lock: false
#55行;取消注释,修改;提供服务绑定的IP地址,0.0.0.0代表所有地址
network.host: 0.0.0.0
#59行;取消注释;侦听端口为9200(默认)
http.port: 9200
#68行;取消注释,修改;集群发现通过单播实现,指定要发现的节点 node1、node2
discovery.zen.ping.unicast.hosts: ["node1", "node2"]
Verify configuration
grep -v "^#" /etc/elasticsearch/elasticsearch.yml
(4) Create a data storage path and authorize
mkdir -p /data/elk_data
chown elasticsearch:elasticsearch /data/elk_data/
(5) Whether to start elasticsearch successfully
systemctl start elasticsearch
netstat -antp |grep 9200
(6) View node information
Access on the host 192.168.163.1
http://192.168.163.11:9200
http://192.168.163.12:9200
(7) Check the health status of the cluster
Access on the host 192.168.163.1
http://192.168.163.11:9200/_cluster/health?pretty
http://192.168.163.12:9200/_cluster/health?pretty
(8) View the cluster status
Access on the host 192.168.163.1
http://192.168.163.11:9200/_cluster/state?pretty
http://192.168.163.12:9200/_cluster/state?pretty
3. Install the elasticsearch-head plugin
- Install the elasticsearch-head plugin to manage the cluster
(1) Compile and install the node component dependency package
node1(192.168.163.11)
node2 (192.168.163.12)
yum -y install gcc gcc-c++ make
上传软件包 node-v8.2.1.tar.gz 到/opt
cd /opt
tar xzvf node-v8.2.1.tar.gz
cd node-v8.2.1/
./configure && make && make install
这里耗时比较长估计20-30分钟
(2) Install phantomjs (front-end framework)
node1(192.168.163.11)
node2 (192.168.163.12)
上传软件包 phantomjs-2.1.1-linux-x86_64.tar.bz2 到/opt目录下
cd /opt
tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/src/
cd /usr/local/src/phantomjs-2.1.1-linux-x86_64/bin
cp phantomjs /usr/local/bin
(3) Install elasticsearch-head (data visualization tool)
node1(192.168.163.11)
node2 (192.168.163.12)
上传软件包 elasticsearch-head.tar.gz 到/opt
cd /opt
tar zxvf elasticsearch-head.tar.gz -C /usr/local/src/
cd /usr/local/src/elasticsearch-head/
npm install
(4) Modify the main configuration file
node1(192.168.163.11)
node2 (192.168.163.12)
vim /etc/elasticsearch/elasticsearch.yml
......
#-------末尾;添加以下内容--------
http.cors.enabled: true
http.cors.allow-origin: "*"
#-----------参数解释-----------------------------
http.cors.enabled: true #开启跨域访问支持,默认为 false
http.cors.allow-origin: "*" #指定跨域访问允许的域名地址为所有
systemctl restart elasticsearch.service
(5) Start elasticsearch-head
node1(192.168.163.11)
node2 (192.168.163.12)
必须在解压后的 elasticsearch-head 目录下启动服务,进程会读取该目录下的 gruntfile.js 文件,否则可能启动失败。
cd /usr/local/src/elasticsearch-head/
npm run start &
> [email protected] start /usr/local/src/elasticsearch-head
> grunt server
Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100
elasticsearch-head 监听的端口是 9100
netstat -natp |grep 9100
(6) Use the elasticsearch-head plugin to view the cluster status
Access on the host 192.168.163.1
http://192.168.163.11:9100
在Elasticsearch 后面的栏目中输入
http://192.168.163.11:9200
http://192.168.163.12:9100
在Elasticsearch 后面的栏目中输入
http://192.168.163.12:9200
(7) Create an index
node1(192.168.163.11)
Create index as index-demo, type as test
curl -XPUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}'
Back to the host 192.168.163.1
Open the browser and enter the address to view the index information
http://192.168.163.11:9100
In the figure below, you can see that the index is divided into 5 by default, and there is one copy.
Click Data Browse-you will find that the index created on node1 is index-demo, the type is test, and related information
4. Install logstash
Collect logs and output to elasticsearch
(1) Install Apahce service (httpd)
apache(192.168.163.13)
yum -y install httpd
systemctl start httpd
(2) Install Java environment
apache(192.168.163.13)
java -version ###如果没有装 安装yum -y install java
(3) Install logstash
apache(192.168.163.13)
上传logstash-5.5.1.rpm到/opt目录下
cd /opt
rpm -ivh logstash-5.5.1.rpm
systemctl start logstash.service
systemctl enable logstash.service
#建立logstash软连接
ln -s /usr/share/logstash/bin/logstash /usr/local/bin/
(4) Test the logstash command
apache(192.168.163.13)
字段描述解释:
-f 通过这个选项可以指定logstash的配置文件,根据配置文件配置logstash
-e 后面跟着字符串 该字符串可以被当做logstash的配置(如果是” ”,则默认使用stdin做为输入、stdout作为输出)
-t 测试配置文件是否正确,然后退出
Define input and output streams:
standard input is used for input, standard output is used for output (similar to a pipeline)
logstash -e 'input { stdin{} } output { stdout{} }'
Use rubydebug to display detailed output, codec is a codec
logstash -e 'input { stdin{} } output { stdout{ codec=>rubydebug } }'
Use Logstash to write information into Elasticsearch
logstash -e 'input { stdin{} } output { elasticsearch { hosts=>["192.168.163.11:9200"] } }'
Access on the host 192.168.163.1
View index information
多出 logstash-日期
http://192.168.163.11:9100
Click the data browse to view the content of the response
(5) Do the docking configuration on the Apache host
apache(192.168.163.13)
The Logstash configuration file is mainly composed of three parts: input, output, and filter (as required)
chmod o+r /var/log/messages
ll /var/log/messages
vim /etc/logstash/conf.d/system.conf
input {
file{
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
}
output {
elasticsearch {
hosts => ["192.168.163.11:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
systemctl restart logstash.service
Access on the host 192.168.163.1
View index information
多出 system-日期
http://192.168.163.11:9100
5, cheap kibana
node1(192.168.163.11)
上传kibana-5.5.1-x86_64.rpm 到/opt目录
cd /opt
rpm -ivh kibana-5.5.1-x86_64.rpm
cd /etc/kibana/
cp kibana.yml kibana.yml.bak
vim kibana.yml
#2行;取消注释;kibana打开的端口(默认5601)
server.port: 5601
#7行;取消注释,修改;kibana侦听的地址
server.host: "0.0.0.0"
#21行;取消注释,修改;和elasticsearch建立联系
elasticsearch.url: "http://192.168.163.11:9200"
#30行;取消注释;在elasticsearch中添加.kibana索引
kibana.index: ".kibana"
systemctl start kibana.service
systemctl enable kibana.service
Access on the host 192.168.163.1
192.168.163.11:5601
Create an index name when logging in for the first time: system-* (this is the docking system log file)
and then click the create button at the bottom to create
Then click the Discover button in the upper left corner and you will find the system-* information.
Then click add next to the host below, and you will find that the picture on the right has only the Time and host options. This is more friendly
(6) Apache log files (access log, error log) connected to the Apache host
apache(192.168.163.13)
cd /etc/logstash/conf.d/
vim apache_log.conf
input {
file{
path => "/etc/httpd/logs/access_log"
type => "access"
start_position => "beginning"
}
file{
path => "/etc/httpd/logs/error_log"
type => "error"
start_position => "beginning"
}
}
output {
if [type] == "access" {
elasticsearch {
hosts => ["192.168.163.11:9200"]
index => "apache_access-%{+YYYY.MM.dd}"
}
}
if [type] == "error" {
elasticsearch {
hosts => ["192.168.163.11:9200"]
index => "apache_error-%{+YYYY.MM.dd}"
}
}
}
/usr/share/logstash/bin/logstash -f apache_log.conf
Access on the host 192.168.163.1
Open and enter http://192.168.163.13, manufacturing site access record.
Open the browser and enter http://192.168.163.11:9100/ to view the index information. You
can find apache_error-2021.03.04 and apache_access-2021.03.04
Open the browser and enter http://192.168.163.11:5601 and
click on the management option in the lower left corner—index patterns—create index pattern
to create indexes for apache_error-* and apache_access-* respectively
to sum up
There are still imperfections in this architecture, so we can continue to optimize and expand the architecture, such as expanding it to the efk architecture.
efk architecture is elasticsearch + logstash + filebeat + kafka + kibana + redis configuration, wherein elasticsearch for indexing and storing data; logstash for format conversion; filebeat (lightweight document collection tool) for log collection; Kafka ( Message queue can process hundreds of thousands of concurrent data per second) + redis (caching service) is used to resist high concurrency; kibana is used to display front-end data.
I won't talk about the specifics, and those who are interested can take a look at it.