ELK&&ELFK log analysis system

Table of contents

1. Overview of ELK

2. Why use ELK

3. Basic features of complete log system

4. Working principle of ELK:

 5. ELK Elasticsearch cluster deployment (operating on Node1 and Node2 nodes)

1. Environmental preparation

 2. Deploy Elasticsearch software

3.Install the Elasticsearch-head plug-in

6. ELK Logstash deployment (operated on Apache node)


1. Overview of ELK

The ELK platform is a complete centralized log processing solution that uses three open source tools, ElasticSearch, Logstash, and Kiabana , to complete more powerful users' log query, sorting, and statistical needs.

●ElasticSearch : It is a distributed storage retrieval engine developed based on Lucene (a full-text search engine architecture) and is used to store various logs.
Elasticsearch is developed in Java and provides a RESTful web interface that allows users to communicate with Elasticsearch through a browser.
Elasticsearch is a real-time, distributed and scalable search engine that allows full-text, structured search. It is commonly used to index and search large volumes of log data, and can also be used to search many different types of documents.
1 second

●Kiabana : Kibana is usually deployed together with Elasticsearch. Kibana is a powerful data visualization Dashboard for Elasticsearch. Kibana provides a graphical web interface to browse Elasticsearch log data and can be used to summarize, analyze and search important data.

●Logstash : as a data collection engine. It supports dynamically collecting data from various data sources, filtering, analyzing, enriching, unifying the format, etc., and then storing it in a location specified by the user, and generally sending it to Elasticsearch.
Logstash is written in the Ruby language and runs on the Java Virtual Machine (JVM). It is a powerful data processing tool that can realize data transmission, format processing, and formatted output. Logstash has powerful plug-in functions and is often used for log processing.
Function: input (data collection) filter (data filtering) output (data output)

#Other components that can be added:

●Filebeat : lightweight open source log file data collector. Filebeat is usually installed on the client that needs to collect data, and the directory and log format are specified. Filebeat can quickly collect data and send it to logstash or directly to Elasticsearch for storage. It has obvious performance advantages over logstash running on the JVM. , is its replacement. Often used in EFLK architecture. line parsing,


#filebeat brings benefits when combined with logstash:
1) Logstash has a disk-based adaptive buffering system that will absorb incoming throughput, thereby alleviating the pressure on Elasticsearch to continuously write data
2) From other data sources (such as databases, Pull from S3 object storage or messaging queue)
3) Send data to multiple destinations such as S3, HDFS (Hadoop Distributed File System) or write to a file
4) Use conditional data flow logic to compose more complex processing pipelines
 

Cache/message queue (redis, kafka, RabbitMQ, etc.) : It can perform traffic peak cutting and buffering on high-concurrency log data. Such buffering can protect data from loss to a certain extent and can also decouple the entire architecture from applications.

●Fluentd : is a popular open source data collector. Due to the shortcomings of logstash being too heavyweight, Logstash has low performance, high resource consumption, etc., and then Fluentd appeared. Compared with logstash, Fluentd is easier to use, consumes less resources, has higher performance, and is more efficient and reliable in data processing. It is welcomed by enterprises and has become an alternative to logstash and is often used in EFK architecture. EFK is also often used as a log data collection solution in Kubernetes clusters.
In a Kubernetes cluster, Fluentd is generally run through a DaemonSet so that it can run a Pod on each Kubernetes worker node. It works by taking container log files, filtering and transforming the log data, and then passing the data to an Elasticsearch cluster where it is indexed and stored.

2. Why use ELK

  • Logs mainly include system logs, application logs and security logs. System operation and maintenance and developers can use logs to understand server software and hardware information, check errors in the configuration process and the causes of errors. Frequent analysis of logs can help you understand the server's load, performance and security, and take timely measures to correct errors.
  • Often we can basically achieve simple analysis of the logs of a single machine using tools such as grep and awk, but when the logs are scattered and stored on different devices. If you manage dozens or hundreds of servers, you are still using the traditional method of logging into each machine in turn to view logs. Doesn't this feel very cumbersome and inefficient? It is imperative that we use centralized log management, such as open source syslog, to collect and summarize logs on all servers. After centralized management of logs, log statistics and retrieval have become a more troublesome matter. Generally, we can use Linux commands such as grep, awk and wc to achieve retrieval and statistics, but for higher requirements such as query, sorting and statistics, etc. With the huge number of machines, it is inevitable that this method is still a bit inadequate.
  • Generally, large-scale systems have a distributed deployment architecture, and different service modules are deployed on different servers. When a problem occurs, in most cases, it is necessary to locate the specific server and service module based on the key information exposed by the problem, and build a centralized system. A log system can improve the efficiency of locating problems.

3. Basic features of complete log system

Collection: can collect log data from multiple sources.
Transmission: can stably parse, filter and transmit log data to the storage system.
Storage: store log data.
Analysis: support UI analysis.
Warning: can provide error reports and monitoring mechanisms.

4. Working principle of ELK:

(1) Deploy Logstash on all servers that need to collect logs; or first centrally manage logs on the log server and deploy Logstash on the log server.
(2) Logstash collects logs, formats them and outputs them to the Elasticsearch cluster.
(3) Elasticsearch indexes and stores the formatted data.
(4) Kibana queries data from the ES cluster to generate charts and displays front-end data.

Summary: Logstash, as a log collector, collects data from data sources, filters and formats the data, and then sends it to Elasticsearch for storage, and Kibana performs visual processing of the logs.

 5. ELK Elasticsearch cluster deployment (operating on Node1 and Node2 nodes)

1. Environmental preparation

Node1节点(2C/4G):node1/192.168.181.101					Elasticsearch  Kibana
Node2节点(2C/4G):node2/192.168.181.102					Elasticsearch
Apache节点:apache/192.168.10.15						Logstash  Apache



systemctl stop firewalld
setenforce 0

注:版本问题
java -version										#如果没有安装,yum -y install java
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b12)
OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)

建议使用jdk

 2. Deploy Elasticsearch software

(1)安装elasticsearch—rpm包
#上传elasticsearch-5.5.0.rpm到/opt目录下
cd /opt
rpm -ivh elasticsearch-5.5.0.rpm 

(2)加载系统服务
systemctl daemon-reload    
systemctl enable elasticsearch.service

(3)修改elasticsearch主配置文件
cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak
vim /etc/elasticsearch/elasticsearch.yml
--17--取消注释,指定集群名字
cluster.name: my-elk-cluster
--23--取消注释,指定节点名字:Node1节点为node1,Node2节点为node2
node.name: node1
--33--取消注释,指定数据存放路径
path.data: /data/elk_data
--37--取消注释,指定日志存放路径
path.logs: /var/log/elasticsearch/
--43--取消注释,改为在启动的时候不锁定内存
bootstrap.memory_lock: false
--55--取消注释,设置监听地址,0.0.0.0代表所有地址
network.host: 0.0.0.0
--59--取消注释,ES 服务的默认监听端口为9200
http.port: 9200
--68--取消注释,集群发现通过单播实现,指定要发现的节点 node1、node2
discovery.zen.ping.unicast.hosts: ["node1", "node2"]

grep -v "^#" /etc/elasticsearch/elasticsearch.yml

(4)创建数据存放路径并授权
mkdir -p /data/elk_data
chown elasticsearch:elasticsearch /data/elk_data/

(5)启动elasticsearch是否成功开启
systemctl start elasticsearch.service
netstat -antp | grep 9200

(6)查看节点信息
浏览器访问  http://192.168.181.101:9200  、 http://192.168.181.102:9200 查看节点 Node1、Node2 的信息。

浏览器访问 http://192.168.181.101:9200/_cluster/health?pretty  、 http://192.168.10.14:9200/_cluster/health?pretty查看群集的健康情况,可以看到 status 值为 green(绿色), 表示节点健康运行。

浏览器访问 http://192.168.181.101:9200/_cluster/state?pretty  检查群集状态信息。

#使用上述方式查看群集的状态对用户并不友好,可以通过安装 Elasticsearch-head 插件,可以更方便地管理群集。

3.Install the Elasticsearch-head plug-in

  • After Elasticsearch version 5.0, the Elasticsearch-head plug-in needs to be installed as an independent service and needs to be installed using the npm tool (NodeJS package management tool).
  • Installing Elasticsearch-head requires installing the dependent software node and phantomjs in advance.
  • node: is a JavaScript running environment based on the Chrome V8 engine.
  • phantomjs: It is a JavaScript API based on webkit, which can be understood as an invisible browser. It can do anything that a webkit-based browser can do.

 

(1)编译安装 node
#上传软件包 node-v8.2.1.tar.gz 到/opt
yum install gcc gcc-c++ make -y

cd /opt
tar zxvf node-v8.2.1.tar.gz

cd node-v8.2.1/
./configure
make && make install

(2)安装 phantomjs(前端的框架)
#上传软件包 phantomjs-2.1.1-linux-x86_64.tar.bz2 到
cd /opt
tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/src/
cd /usr/local/src/phantomjs-2.1.1-linux-x86_64/bin
cp phantomjs /usr/local/bin

(3)安装 Elasticsearch-head 数据可视化工具
#上传软件包 elasticsearch-head.tar.gz 到/opt
cd /opt
tar zxvf elasticsearch-head.tar.gz -C /usr/local/src/
cd /usr/local/src/elasticsearch-head/
npm install

(4)修改 Elasticsearch 主配置文件
vim /etc/elasticsearch/elasticsearch.yml
......
--末尾添加以下内容--
http.cors.enabled: true				#开启跨域访问支持,默认为 false
http.cors.allow-origin: "*"			#指定跨域访问允许的域名地址为所有

systemctl restart elasticsearch

(5)启动 elasticsearch-head 服务
#必须在解压后的 elasticsearch-head 目录下启动服务,进程会读取该目录下的 gruntfile.js 文件,否则可能启动失败。
cd /usr/local/src/elasticsearch-head/
npm run start &

> [email protected] start /usr/local/src/elasticsearch-head
> grunt server

Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100

#elasticsearch-head 监听的端口是 9100
netstat -natp |grep 9100

(6)通过 Elasticsearch-head 查看 Elasticsearch 信息
通过浏览器访问 http://192.168.10.13:9100/ 地址并连接群集。如果看到群集健康值为 green 绿色,代表群集很健康。

(7)插入索引
#通过命令插入一个测试索引,索引为 index-demo,类型为 test。

//输出结果如下:curl -X PUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}'
{
"_index" : "index-demo",
"_type" : "test",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"created" : true
}

浏览器访问 http://192.168.181.101:9100/ 查看索引信息,可以看见索引默认被分片5个,并且有一个副本。
点击“数据浏览”,会发现在node1上创建的索引为 index-demo,类型为 test 的相关信息。

6. ELK Logstash deployment (operated on Apache node)

Logstash is generally deployed on servers whose logs need to be monitored. In this case, Logstash is deployed on the Apache server to collect log information from the Apache server and send it to Elasticsearch.
 

1.更改主机名
hostnamectl set-hostname apache

2.安装Apahce服务(httpd)
yum -y install httpd
systemctl start httpd

3.安装Java环境
yum -y install java
java -version

4.安装logstash
#上传软件包 logstash-5.5.1.rpm 到/opt目录下
cd /opt
rpm -ivh logstash-5.5.1.rpm                           
systemctl start logstash.service                      
systemctl enable logstash.service

ln -s /usr/share/logstash/bin/logstash /usr/local/bin/

5.测试 Logstash
Logstash 命令常用选项:
-f:通过这个选项可以指定 Logstash 的配置文件,根据配置文件配置 Logstash 的输入和输出流。
-e:从命令行中获取,输入、输出后面跟着字符串,该字符串可以被当作 Logstash 的配置(如果是空,则默认使用 stdin 作为输入,stdout 作为输出)。
-t:测试配置文件是否正确,然后退出。

定义输入和输出流:
#输入采用标准输入,输出采用标准输出(类似管道)
logstash -e 'input { stdin{} } output { stdout{} }'
......
www.baidu.com										#键入内容(标准输入)
2020-12-22T03:58:47.799Z node1 www.baidu.com		#输出结果(标准输出)
www.sina.com.cn										#键入内容(标准输入)
2017-12-22T03:59:02.908Z node1 www.sina.com.cn		#输出结果(标准输出)

//执行 ctrl+c 退出

#使用 rubydebug 输出详细格式显示,codec 为一种编解码器
logstash -e 'input { stdin{} } output { stdout{ codec=>rubydebug } }'
......
www.baidu.com										#键入内容(标准输入)
{
    "@timestamp" => 2020-12-22T02:15:39.136Z,		#输出结果(处理后的结果)
      "@version" => "1",
          "host" => "apache",
       "message" => "www.baidu.com"
}

#使用 Logstash 将信息写入 Elasticsearch 中
logstash -e 'input { stdin{} } output { elasticsearch { hosts=>["192.168.10.13:9200"] } }'
			 输入				输出			对接
......
www.baidu.com										#键入内容(标准输入)
www.sina.com.cn										#键入内容(标准输入)
www.google.com										#键入内容(标准输入)

//结果不在标准输出显示,而是发送至 Elasticsearch 中,可浏览器访问 http://192.168.10.13:9100/ 查看索引信息和数据浏览。

6.定义 logstash配置文件
Logstash 配置文件基本由三部分组成:input、output 以及 filter(可选,根据需要选择使用)。
input:表示从数据源采集数据,常见的数据源如Kafka、日志文件等
filter:表示数据处理层,包括对数据进行格式化处理、数据类型转换、数据过滤等,支持正则表达式
output:表示将Logstash收集的数据经由过滤器处理之后输出到Elasticsearch。

#格式如下:
input {...}
filter {...}
output {...}

#在每个部分中,也可以指定多个访问方式。例如,若要指定两个日志来源文件,则格式如下:
input {
	file { path =>"/var/log/messages" type =>"syslog"}
	file { path =>"/var/log/httpd/access.log" type =>"apache"}
}

#修改 Logstash 配置文件,让其收集系统日志/var/log/messages,并将其输出到 elasticsearch 中。
chmod +r /var/log/messages					#让 Logstash 可以读取日志

vim /etc/logstash/conf.d/system.conf
input {
    file{
        path =>"/var/log/messages"						#指定要收集的日志的位置
        type =>"system"									#自定义日志类型标识
        start_position =>"beginning"					#表示从开始处收集
    }
}
output {
    elasticsearch {										#输出到 elasticsearch
        hosts => ["192.168.181.101:9200"]					#指定 elasticsearch 服务器的地址和端口
        index =>"system-%{+YYYY.MM.dd}"					#指定输出到 elasticsearch 的索引格式
    }
}

systemctl restart logstash 

浏览器访问 http://192.168.181.101:9100/ 查看索引信息

Guess you like

Origin blog.csdn.net/Sp_Tizzy/article/details/132061161