ELK log management platform using a unified Title III -logstash grok plug

Use 1. ELK unified log management platform for third in -logstash grok plug

  In this post, mainly on the following several points knowledge and practical experience, for your reference:

  1. About JAVA application's log content standards:

  2. How to use grok plug logstash to complete the split message field:

  3. regularly delete Es Index:

1. About JAVA application's log content standards:

  Recently the company has been the main push ELK this project, and I am operation and maintenance personnel ELK this project. So there will be a lot of experience ELK output for the project; Since our company's business systems to JAVA-based language development, especially in Spring Cloud, Spring Boot and other frameworks based. How to log standardized business system architects need to consider is the issue of research and development. Currently we log specification defines ELK as follows:

<pattern>[%date{ISO8601}][%level] %logger{80} [%thread] Line:%-3L [%X{TRACE_ID}] ${dev-group-name}
${app-name} - %msg%n</pattern>

|时间|日志级别|类文件|线程数|代码发生行|全局流水号|开发团队|系统名称|日志信息

时间:记录日志产生时间;
日志级别:ERROR、WARN、INFO、DEBUG;
类文件:打印类文件名称;
线程名:执行操作线程名称;
代码发生行:日志事件发生在代码中位置;
全局流水号:贯穿一次业务流程的全局流水号;
开发团队: 系统开发的团队名称
系统名称:项目名称组建名
INFO: 记录详细日志信息

比如一个业务系统的日志输出标准格式如下:

[2019-06-2409:32:14,262] [ERROR] com.bqjr.cmm.aps.job.ApsAlarmJob [scheduling-1] []
tstteam tst Line:157 - ApsAlarmJob类execute方法,'【测试系统预警】校验指标异常三次预警'预警出错:nested
exception is org.apache.ibatis.exceptions.PersistenceException: ### Error
querying database. Cause: java.lang.NullPointerException ### Cause:
java.lang.NullPointerException org.mybatis.spring.MyBatisSystemException:
nested exception is

2. How to use grok plug logstash to complete the split message field:

  Now we log all fields in accordance with the standard output, but in kibana interface or a message field. Message must now be decomposed to achieve each field can be searched by each field;

  ELK log our platform architecture: all business systems installed filebeat log collection software, the log collection intact to KAFKA cluster, and then sent by kafka cluster to cluster logstash, re-export from the logstash ES cluster to cluster by cluster ES kibana then output to display and search. Why middle logstash software will be used, mainly because logstash software has powerful text processing capabilities, such as plug-grok. Text formatted output can be realized;

  logstash software has built a lot of regular expression template can be used to match the nginx, httpd, syslog and other log;

#logstash默认grok语法模板路径:
/usr/local/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns

#logstash自带的grok语法模板:
[root@SZ1PRDELK00AP005 patterns]# ll
total 116
-rw-r--r-- 1 root root   271 Jun 24 16:05 application
-rw-r--r-- 1 root root  1831 Apr 13  2018 aws
-rw-r--r-- 1 root root  4831 Apr 13  2018 bacula
-rw-r--r-- 1 root root   260 Apr 13  2018 bind
-rw-r--r-- 1 root root  2154 Apr 13  2018 bro
-rw-r--r-- 1 root root   879 Apr 13  2018 exim
-rw-r--r-- 1 root root 10095 Apr 13  2018 firewalls
-rw-r--r-- 1 root root  5338 Apr 13  2018 grok-patterns
-rw-r--r-- 1 root root  3251 Apr 13  2018 haproxy
-rw-r--r-- 1 root root   987 Apr 13  2018 httpd
-rw-r--r-- 1 root root  1265 Apr 13  2018 java
-rw-r--r-- 1 root root  1087 Apr 13  2018 junos
-rw-r--r-- 1 root root  1037 Apr 13  2018 linux-syslog
-rw-r--r-- 1 root root    74 Apr 13  2018 maven
-rw-r--r-- 1 root root    49 Apr 13  2018 mcollective
-rw-r--r-- 1 root root   190 Apr 13  2018 mcollective-patterns
-rw-r--r-- 1 root root   614 Apr 13  2018 mongodb
-rw-r--r-- 1 root root  9597 Apr 13  2018 nagios
-rw-r--r-- 1 root root   142 Apr 13  2018 postgresql
-rw-r--r-- 1 root root   845 Apr 13  2018 rails
-rw-r--r-- 1 root root   224 Apr 13  2018 redis
-rw-r--r-- 1 root root   188 Apr 13  2018 ruby
-rw-r--r-- 1 root root   404 Apr 13  2018 squid

#其中有一个java的模板,已经内置了很多java类、时间戳等
[root@SZ1PRDELK00AP005 patterns]# cat java
JAVACLASS (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]*
#Space is an allowed character to match special cases like 'Native Method' or 'Unknown Source'
JAVAFILE (?:[A-Za-z0-9_. -]+)
#Allow special <init>, <clinit> methods
JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*)
#Line number is optional in special cases 'Native method' or 'Unknown source'
JAVASTACKTRACEPART %{SPACE}at %{JAVACLASS:class}\.%{JAVAMETHOD:method}\(%{JAVAFILE:file}(?::%{NUMBER:line})?\)
# Java Logs
JAVATHREAD (?:[A-Z]{2}-Processor[\d]+)
JAVACLASS (?:[a-zA-Z0-9-]+\.)+[A-Za-z0-9$]+
JAVAFILE (?:[A-Za-z0-9_.-]+)
JAVALOGMESSAGE (.*)
# MMM dd, yyyy HH:mm:ss eg: Jan 9, 2014 7:13:13 AM
CATALINA_DATESTAMP %{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM)
# yyyy-MM-dd HH:mm:ss,SSS ZZZ eg: 2014-01-09 17:32:25,527 -0800
TOMCAT_DATESTAMP 20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) %{ISO8601_TIMEZONE}
CATALINALOG %{CATALINA_DATESTAMP:timestamp} %{JAVACLASS:class} %{JAVALOGMESSAGE:logmessage}
# 2014-01-09 20:03:28,269 -0800 | ERROR | com.example.service.ExampleService - something compeletely unexpected happened...
TOMCATLOG %{TOMCAT_DATESTAMP:timestamp} \| %{LOGLEVEL:level} \| %{JAVACLASS:class} - %{JAVALOGMESSAGE:logmessage}
[root@SZ1PRDELK00AP005 patterns]#

#但是仅靠默认的这个模板还是不能匹配我们公司自定义的日志内容,所以我还自己写了一个
[root@SZ1PRDELK00AP005 patterns]# cat application
APP_DATESTAMP 20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND})
THREADS_NUMBER (?:[a-zA-Z0-9-]+)
GLOBAL_PIPELINE_NUMBER (?:[a-zA-Z0-9-]+)
DEV_TEAM (?:[a-zA-Z0-9-]+)
SYSTEM_NAME (?:[a-zA-Z0-9-]+)
LINE_NUMBER (Line:[0-9]+)
JAVALOGMESSAGE (.*)
APPLOG \[%{APP_DATESTAMP:timestamp}\] \[%{LOGLEVEL:loglevel}\] %{JAVACLASS:class} \[%{THREADS_NUMBER:threads_number}\] \[%{GLOBAL_PIPELINE_NUMBER:global_pipeline_number}\] %{DEV_TEAM:team} %{SYSTEM_NAME:system_name} %{LINE_NUMBER:linenumber} %{JAVALOGMESSAGE:logmessage}

# 然后就是配置logstash

[root@SZ1PRDELK00AP005 patterns]# cat /usr/local/logstash/config/yunyan.conf
input {
  kafka {
    bootstrap_servers => "192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092"
    topics_pattern => "elk-tst-tst-info.*"
    group_id => "test-consumer-group"
    codec => json
    consumer_threads => 3
    decorate_events => true
    auto_offset_reset => "latest"
  }
}

filter {
    grok {
             match => {"message" => ["%{APPLOG}","%{JAVALOGMESSAGE:message}"]}  #注意这里的APPLOG就是我上面自定义的名字
             overwrite => ["message"]

}
}

output {
  elasticsearch {
     hosts => ["192.168.1.19:9200","192.168.1.24:9200"]
     user => "elastic"
     password => "111111"
     index => "%{[@metadata][kafka][topic]}-%{+YYYY-MM-dd}"
     workers => 1
  }
}

#output {
#   stdout{
#      codec => "rubydebug"
#  }
#}

#一般建议调试的时候,先输出到stdout标准输出,不要直接输出到es.等在标准输出确认已经OK了,所有的格式化字段都可以分别输出了,再去输出到ES;
# 如何将grok的正则表达式写好,有一个在线的grok表达式测试地址:  http://grokdebug.herokuapp.com/

  After the output of a standard log contents, you can key: the value of the search query format, for example, can enter in the search bar: loglevel: ERROR search only the contents of the log level of ERROR;

3. regularly delete Es Index:

  The configuration index is defined inside logstash output plug, such as by day index, with the index name is later -% {+ YYYY-MM- dd}, if you want to change in accordance with the month index, that is -% {+ YYYY -MM}. Index different content, should also be defined in different ways. For example, the operating system like a log, not much changes every day, you can follow the month to divide the index. But the business system itself, log, because the more logs generated per day, is more suitable for use by the day of the index. Because for elasticsearch, index too large will also affect performance, if too many indexes can affect performance. The main performance bottleneck in the CPU elasticsearch
me in the course of operation and maintenance ELK this project, discovered since the index file is too large, too many indexes, but our es data node cpu configuration is too low, causing the collapse of ES cluster. There are several ways to solve this problem, first delete the index is timed to no avail, and the second is to optimize the parameters of the ES index. The second point I have yet to practice, practice and then follow-up summary documents; first regularly delete method to manually delete the index and the index to write about.

#/bin/bash
#指定日期(7天前)
DATA=`date -d "1 week ago" +%Y-%m-%d`

#当前日期
time=`date`

#删除7天前的日志
curl -u elastic:654321 -XGET "http://192.168.1.19:9200/_cat/indices/?v"|grep $DATA
if [ $? == 0 ];then
  curl -u elastic:654321 -XDELETE "http://127.0.0.1:9200/*-${DATA}"
  echo "于 $time 清理 $DATA 索引!"
fi

curl -u elastic:654321 \-XGET "http://192.168.1.19:9200/_cat/indices/?v"|awk '{print $3}'|grep elk >> /tmp/es.txt
#手工删除索引,将索引名输出到一个文本文件,然后通过循环删除的方法
for i in `cat /tmp/es.txt`;do curl -u elastic:654321 -X DELETE "192.168.1.19:9200/$i";done

  Well, today temporarily to write here. Recent work particularly busy, hard to find time to update the technology blog. Basically to work overtime late at night or early in the morning up to blog updates about, a lot of time working really hard task for taking the time to update the blog. Also thank you for your continued attention.

Further details Bowen please pay attention to my personal micro-channel public number "cloud era IT operations", the public number in order to share Restoration technology, new trends in Internet transport; including IT operation and maintenance industry, consulting, operation and maintenance of technical documentation to share. Focus devops, jenkins, zabbix monitoring, kubernetes, ELK, using a variety of middleware, such as redis, MQ, etc.; shell and python programming languages ​​such as operation and maintenance; I have engaged in work related to the IT operation and maintenance over a decade. Since 2008 full-time in Linux / Unix systems operation and maintenance work; there is a certain degree of understanding of the operation and maintenance related technologies. This number all public posts are my actual work experience summary, basically the original blog post. I am very happy to experience I accumulated experience, technology sharing to share with you! And we hope to grow and progress together in the operation and maintenance of IT career paths;ELK log management platform using a unified Title III -logstash grok plug

Guess you like

Origin blog.51cto.com/zgui2000/2413917