How to use the aggregate log system ELK in a distributed environment

Introduction to ELK

I believe everyone is familiar with the ELK log system. If your system is a cluster with multiple instances, it is definitely not convenient to go to the background to view the logs. Because the front-end access is randomly routed to the back-end app, an aggregated log query system is needed. ELK is a good choice.

ELK simply puts it: Elasticsearch + Logstash + Kibana. Logstash is used to analyze logs and obtain the desired log format; Elasticsearch is used to create indexes for logs; Kibana is used to display logs.

image

Here we will add a collection software: FileBeat, which is used to collect the logs of each app.

The system organization chart is as follows:

ELK configuration

The environment for installing ELK needs to install JDK. Here I will talk about some simple configurations. For detailed installation, you can search on the Internet. Children's shoes who are interested can read this article:

https://blog.51cto.com/54dev/2570811?source=dra

After we decompress the Logstash software, we need to modify the startup.optionsfile configuration:

cd logstash/config/startup.options

LS_HOME=/home/elk/support/logstash
LS_SETTINGS_DIR="${LS_HOME}/config"

LS_OPTS="--path.settings ${LS_SETTINGS_DIR}"
LS_JAVA_OPTS=""

LS_PIDFILE=${LS_HOME}/logs/logstash.pid

LS_USER=elk
LS_GROUP=elk

LS_GC_LOG_FILE=${LS_HOME}/logs/logstash-gc.log

We also need to configure the port of beats for Logstash, as well as the ip address of es:

cd logstash/config/logstash-aicrm-with-filebeat.yml

input {
beats {
    port => 5044
    }
}

output {
    elasticsearch {
    hosts => "192.168.205.129:9200"
    index => "aicrm-app-node"
    }
}

For Filebeat, we need to configure the log format and the address of Logstash:

cd filebeat/filebeat.yml

filebeat:
prospectors:
-
paths:
- /home/elk/logs/xxx*.log
type: log
mutiline.pattern: '^\['
mutiline.negate: true
mutiline.match: after

output:
logstash:
hosts: ["192.168.205.129:5044"]

Finally, we configure the address of es in Kibana:

cd kibana/config/kibana.yml

...
7 server.host: "0.0.0.0"
8
...
21 elasticsearch.url: "http://192.168.205.129:9200";
22

After configuration, start ELK

The startup sequence is: elasticsearch ➡ logstash ➡ filebeat ➡ kibana

Process after startup:

image

Our browser side visit:

http://192.168.205.129:5601/app/kibana

Here we need to create an ES index

Then we can search the log:

image

About log analysis

According to the business situation, there will be requirements for ELK to parse logs in multiple formats. At this time, it is necessary to configure grok rules to parse log files in the Logstash configuration file. It is recommended to use online tools to test grok analysis.

Online Grok parsing tool address: https://grokdebug.herokuapp.com/?#

Note that this resolved address requires FQ to access.

Analysis example:

Online test sample:

image

Grok statements need to be written in the configuration file in ELK's Logstash, as shown below:

Exception log

2018-11-09 23:01:18.766  [ERROR]  com.xxx.rpc.server.handler.ServerHandler - 调用com.xxx.search.server.SearchServer.search时发生错误!
java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)

grok analysis

%{TIMESTAMP_ISO8601:log_time} [%{DATA:log_level}] %{GREEDYDATA:message}

Analysis result

{
"log_time": [
  [
    "2018-11-09 23:01:18.766"
  ]
],
"YEAR": [
  [
    "2018"
  ]
],
"MONTHNUM": [
  [
    "11"
  ]
],
"MONTHDAY": [
  [
    "09"
  ]
],
"HOUR": [
  [
    "23",
    null
  ]
],
"MINUTE": [
  [
    "01",
    null
  ]
],
"SECOND": [
  [
    "18.766"
  ]
],
"ISO8601_TIMEZONE": [
  [
    null
  ]
],
"log_level": [
  [
    "ERROR"
  ]
],
"message": [
  [
    " com.xxx.rpc.server.handler.ServerHandler - 调用com.xxx.search.server.SearchServer.search时发生错误!"
  ]
]

Let's analyze the following log:

<operation_in>请求报文:service_name接口名,sysfunc_id功能号,operator_id 操作员id,organ_id 机构号,request_seq 请求流水

2018-11-12 15:03:41.388 639211542011357848  [DEBUG]  com.base.core.aop.http.HttpClient.send(HttpClient.java:128) - 
reqid:b7fb8f90ddeb11e83d622c02b34132f7;AOP 发送信息: <?xml version="1.0" encoding="GBK"?><operation_in<service_name>BSM_SaleSystemLogin</service_name>     
<sysfunc_id>91008027</sysfunc_id><request_type>1002</request_type><verify_code>304147201506190000000040</verify_code><operator_id>9991445</operator_id>              
 <organ_id>9999997</organ_id><request_time>20181112150341</request_time><request_seq>154200622111</request_seq><request_source>304147</request_source><request_target></request_target><msg_version>0100</msg_version><cont_version>0100</cont_version><access_token></access_token><content><request><msisdn>13xx6945211</msisdn><password>871221</password><portal_id>101704</portal_id><login_type>34</login_type><machine_mac>0000</machine_mac><machine_ip>120.33.xxx.198, 10.46.xxx.182, </machine_ip><machine_cpu></machine_cpu><machine_system_ver>12.0.1</machine_system_ver><machine_totalmemory></machine_totalmemory><machine_usablememory></machine_usablememory><machine_ie_ver></machine_ie_ver></request></content></operation_in>
<operation_out>
<operation_out><service_name>BSM_SaleSystemLogin</service_name><request_type>1002</request_type><sysfunc_id>91008027</sysfunc_id>
<request_seq>15xxx0622111</request_seq><response_time>20181112150342</response_time><response_seq>471860579309</response_seq><request_source>304147</request_source><response><resp_type>0</resp_type><resp_code>0000</resp_code><resp_desc/></response><content><response><base_info><verifycode>173616671275425657328820</verifycode><operator_id>132394</operator_id><row><msisdn>13xxx945211</msisdn><role_id>6100004</role_id><owning_mode>1</owning_mode><status>1</status><inure_time>20170623145448</inure_time><expire_time>30000101000000</expire_time><request_source>0</request_source><modify_time>20170623145448</modify_time><modify_operator_id>4020205</modify_operator_id><modify_content>创建手机号码与角色对应关系

We parse this log on the Grok website to obtain the grok statement:

%{TIMESTAMP_ISO8601:log_time} %{DATA:serial_number} [%{DATA:log_level}] %{GREEDYDATA:message}<service_name>%{DATA:service_name}</service_name> <sysfunc_id>%{DATA:sysfunc_id}</sysfunc_id><request_type>%{DATA:other}</operator_id><organ_id>%{DATA:organ_id}</organ_id><request_time>%{DATA:request_time}</request_time><request_seq>%{DATA:request_seq}</request_seq><request_source>%{DATA:other}<operation_out>

The analysis results are as follows:

{
  "log_time": [
    [
      "2018-11-12 15:03:41.388"
    ]
  ],
  "YEAR": [
    [
      "2018"
    ]
  ],
  "MONTHNUM": [
    [
      "11"
    ]
  ],
  "MONTHDAY": [
    [
      "12"
    ]
  ],
  "HOUR": [
    [
      "15",
      null
    ]
  ],
  "MINUTE": [
    [
      "03",
      null
    ]
  ],
  "SECOND": [
    [
      "41.388"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ],
  "serial_number": [
    [
      "639211542011357848 "
    ]
  ],
  "log_level": [
    [
      "DEBUG"
    ]
  ],
  "message": [
    [
      " com.base.core.aop.http.HttpClient.send(HttpClient.java:128) - reqid:b7fb8f90ddeb11e83d622c02b34132f7;AOP 发送信息: <?xml version="1.0" encoding="GBK"?>   <operation_in"
    ]
  ],
  "service_name": [
    [
      "BSM_SaleSystemLogin"
    ]
  ],
  "sysfunc_id": [
    [
      "91008027"
    ]
  ],
  "other": [
    [
      "1002</request_type><verify_code>304147201506190000000040</verify_code><operator_id>9991445",
      "304147</request_source><request_target></request_target><msg_version>0100</msg_version><cont_version>0100</cont_version><access_token></access_token><content><request><msisdn>13xxx945211</msisdn><password>871221</password><portal_id>101704</portal_id><login_type>34</login_type><machine_mac>0000</machine_mac><machine_ip>120.33.xxx.198, 10.46.xxx.182, </machine_ip><machine_cpu></machine_cpu><machine_system_ver>12.0.1</machine_system_ver><machine_totalmemory></machine_totalmemory><machine_usablememory></machine_usablememory><machine_ie_ver></machine_ie_ver></request></content></operation_in>"
    ]
  ],
  "organ_id": [
    [
      "9999997"
    ]
  ],
  "request_time": [
    [
      "20181112150341"
    ]
  ],
  "request_seq": [
    [
      "154xxx622111"
    ]
  ]
}

Let's parse a niginx log:

2018/11/01 23:30:39 [error] 15105#0: *397937824 connect() failed (111: Connection refused) while connecting to upstream, client: 10.48.xxx.3, server: 127.0.0.1, request: "POST /o2o_usercenter_svc/xxx/sysUserInfoService?req_sid=1612e430ddeb11e83d622c02b34132f7&syslogid=null HTTP/1.1", upstream: "http://127.0.0.1:xxx/o2o_usercenter_svc/remote/sysUserInfoService?req_sid=1612e430ddeb11e83d622c02b34132f7&syslogid=null", host: "10.46.xxx.155:xxx"

Parse the statement:

(?%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) [%{LOGLEVEL:severity}] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?<remote_addr>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server}?)(?:, request: %{QS:request})?(?:, upstream: (?\”%{URI}\”|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \”%{URI:referrer}\”)?

The analysis results are as follows:

{
  "timestamp": [
    [
      "2018/11/01 23:30:39"
    ]
  ],
  "YEAR": [
    [
      "2018"
    ]
  ],
  "MONTHNUM": [
    [
      "11"
    ]
  ],
  "MONTHDAY": [
    [
      "01"
    ]
  ],
  "TIME": [
    [
      "23:30:39"
    ]
  ],
  "HOUR": [
    [
      "23"
    ]
  ],
  "MINUTE": [
    [
      "30"
    ]
  ],
  "SECOND": [
    [
      "39"
    ]
  ],
  "severity": [
    [
      "error"
    ]
  ],
  "pid": [
    [
      "15105"
    ]
  ],
  "NUMBER": [
    [
      "0"
    ]
  ],
  "BASE10NUM": [
    [
      "0"
    ]
  ],
  "errormessage": [
    [
      "*397937824 connect() failed (111: Connection refused) while connecting to upstream"
    ]
  ],
  "remote_addr": [
    [
      "10.48.xxx.3"
    ]
  ],
  "IP": [
    [
      "10.48.xxx.3",
      null,
      null,
      null
    ]
  ],
  "IPV6": [
    [
      null,
      null,
      null,
      null
    ]
  ],
  "IPV4": [
    [
      "10.48.xxx.3",
      null,
      null,
      null
    ]
  ],
  "HOSTNAME": [
    [
      null,
      "127.0.0.1",
      "127.0.0.1",
      null
    ]
  ],
  "server": [
    [
      "127.0.0.1"
    ]
  ],
 ...
}

Kibana chart panel

image

In the above figure, we can configure some monitoring panels on Kibana. For example, configure abnormal log monitoring.

About ELK monitoring panel configuration, interested children can read this article:

https://blog.51cto.com/hnr520/1845900

Recommended in the past

Scan the QR code to get more exciting. Or search Lvshen_9 on WeChat , you can reply to get information in the background

Reply "java" to get java e-book;

Reply "python" to get python e-book;

Reply "Algorithm" to get the algorithm e-book;

Reply to "big data" to get big data e-books;

Reply to "spring" to get the SpringBoot learning video.

Reply to "Interview" to obtain interview materials from first-line manufacturers

Reply to "The Road to Advancement" to get a mind map of the Road to Advancement in Java

Reply to "Manual" to get Alibaba Java Development Manual (Songshan Ultimate Edition)

Reply "Summary" to get the PDF version of the Java back-end interview experience summary

Reply to "Redis" to get the Redis command manual, and Redis special interview questions (PDF)

Reply to "Concurrent Map" to get Java Concurrent Programming Mind Map (xmind Ultimate Edition)

Another: Click [ My Benefits ] to have more surprises.

Guess you like

Origin blog.csdn.net/wujialv/article/details/115324623