Logstash uses grok to parse logs and problems encountered by kibana

Reprinted from
http://xiaorui.cc/2015/01/27/logstash%E4%BD%BF%E7%94%A8grok%E6%AD%A3%E5%88%99%E8%A7%A3%E6%9E %90%E6%97%A5%E5%BF%97%E9%81%87%E5%88%B0%E7%9A%84%E9%97%AE%E9%A2%98/


Damn, change The reason for using logstash is because scribe really can't handle it, and the second is that the product manager needs me to develop a customizable panel charting system.

Because I haven't worked on the elk solution for a long time, I have forgotten the syntax of logstash, because the crawler is defined by ourselves, and at this time, we need to write regular rules by ourselves.


Logstash itself has built-in variable regularization of many programs, such as the regularization of nginx haproxy apahce tomcat, you need to specify the type format yourself.


Mark the source address of the article here, http://xiaorui.cc http://xiaorui.cc/?p=1055
Then the problem comes... It seems that the type cannot be imported casually. I just used nginx without paying attention at first. -access As a result, the grok regularity in the filter can't be matched at all, which is very annoying...

After the type is finally eliminated, it can be matched normally.

Regarding grep or grok, you can check the regular matching at http://grokdebug.herokuapp.com/ here.

[img]
http://xiaorui.cc/wp-content/uploads/2015/01/20150127153051_76366.png
[/img]
I have tested the configuration of logstash agent.conf.

input {
    file {
                type => “producer”
                path => “/home/ruifengyun/buzzspider/spider/spider.log”
        }
}
filter  {
    grok    {
        pattern => “\[(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3})\]\s(?<level>\w*)\s\”Crawl\surl:(?<url>(.*)) (?<httpcode>[0-9]{2,3})\stakes (?<cost>\d.\d\d).*”
        }
    }
output  {
    say again {
               host => “123.116.x.x”
               data_type =>”list”
               key => “logstash:demo”
    }
    stdout { codec => rubydebug}
}




terminal display result
{
"message" => "[2015-01-27 14:56:55,613] INFO \"Crawl url:http://weixin.sogou.com/weixin?query=%E4%B9%90%E6%92%AD%E8%AF%97&tsn=1&interation=&type=2&interV=kKIOkrELjbkRmLkElbkTkKIMkrELjboImLkEk74TkKIRmLkEk78TkKILkbELjboN_105333196&ie=utf8&page=7&p=40040100&dp=1&num=100 200 takes 0.085 seconds, refer:, depth:2\"",
    "@timestamp" => "2015-01-27T06:56:56.359Z",
      "@version" => "1",
          "type" => "producer",
          "host" => "bj-buzz-dev01",
          "path" => "/home/ruifengyun/buzzspider/spider/spider.log",
      "datetime" => "2015-01-27 14:56:55,613",
         "level" => "INFO",
           "url" => "http://weixin.sogou.com/weixin?query=%E4%B9%90%E6%92%AD%E8%AF%97&tsn=1&interation=&type=2&interV=kKIOkrELjbkRmLkElbkTkKIMkrELjboImLkEk74TkKIRmLkEk78TkKILkbELjboN_105333196&ie=utf8&page=7&p=40040100&dp=1&num=100",
      "httpcode" => "200",
          "cost" => "0.08"
}
{
       "message" => "[2015-01-27 14:56:55,637] INFO \"Crawl url:http://dealer.autohome.com.cn/8178/newslist.html 200 takes 0.146 seconds, refer:, depth:2\"",
    "@timestamp" => "2015-01-27T06:56:56.359Z",
      "@version" => "1",
          "type" => "producer",
          "host" => "bj-buzz-dev01",
          "path" => "/home/ruifengyun/buzzspider/spider/spider.log",
      "datetime" => "2015-01-27 14:56:55,637",
         "level" => "INFO",
           "url" => "http://dealer.autohome.com.cn/8178/newslist.html",
      "httpcode" => "200",
          "cost" => "0.14"
}

The results we see on the interface of kibana 3, here is how many times I have crawled sogou.com in the next time period.



One of the problems with kibana is that it doesn't know how to write the search statement.


url:*.sogou.com* , I thought it was possible to write pure regular. The kibana backend uses the syntax of es, so your syntax should correspond to elasticsearch.

[img]
http://xiaorui.cc/wp-content/uploads/2015/01/20150127153034_72934.png

[/img]
################ The official document also has a detailed description , Simply translate the official logstash article about grok.

Example

Below is what the log looks like
55.3.244.1 GET /index.html 15824 0.043

Regular example
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

Configuration How is it written in the file?

input {
  file {
    path => “/var/log/http.log”
  }
}
filter {
  grok {
    match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ]
  }
}

After parsing, what does it look like?

client: 55.3.244.1
method: GET
request: /index.html
bytes: 15824
duration: 0.043


custom regular


(?<field_name>the pattern here)

(?<queue_id>[0-9A-F]{10,11})

Of course, you can also put many regulars in a centralized file.
# in ./patterns/postfix
POSTFIX_QUEUEID [0-9A-F]{10,11}

filter {
  grok {
    patterns_dir => “./patterns”
    match => [ "message", "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id }: %{GREEDYDATA:syslog_message}"






Logstash already comes with a lot of regulars. If you want to be lazy, you can borrow them from the built-in regulars.

USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+ -])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+) ))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa- f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa- f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b

POSINT \b(?:[1 -9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING ( ?>(?<!\\)(?>”(?>\\.|[^\\"]+)+”|””|(?>'(?>\\.|[^\\' ]+)+')|”|(?>(?>\\.|[^\]+)+)|`))
UUID [A-Fa-f0-9] {8}-(?: [A-Fa-f0-9] {4}-) {3} [A-Fa-f0-9] {12}

# Networking
MAC ( ?:%{CISCOMAC} |%{WINDOWSMAC} |%{COMMONMAC})
CISCOMAC (?: (?: [A-Fa-f0-9] {4} \.) {2} [A-Fa-f0-9 ] {4})
WINDOWSMAC (?: (?: [A-Fa-f0-9] {2}-) {5} [A-Fa-f0-9] {2})
COMMONMAC (?: (?: [ A-Fa-f0-9] {2}:) {5} [A-Fa-f0-9] {2})
IPV6 ((([0-9A-Fa-f] {1,4}:) {7} ([0-9A-Fa-f] {1,4} | :)) | (([0-9A- Fa-f] {1,4}:) {6} (: [0-9A-Fa-f] {1,4} | ((25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ d) (\. (25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ d)) {3}) | :)) | (([0-9A-Fa-f] {1,4}:) {5} (((: [0-9A-Fa-f] {1,4}) { 1,2}) |: ((25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ D) (\. (25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ D)) {3}) | :)) | (([0-9A-Fa-f] {1,4 }:) {4} (((: [0-9A-Fa-f] {1,4}) {1,3}) | ((: [0-9A-Fa-f] {1,4}) ?: ((25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ D) (\. (25 [0-5] | 2 [0- 4] \ d | 1 \ d \ d | [1-9]? \ D)) {3})) | :)) | (([0-9A-Fa-f] {1,4}:) { 3} (((: [0-9A-Fa-f] {1,4}) {1,4}) | ((: [0-9A-Fa-f] {1,4}) {0,2 }: ((25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ D) (\. (25 [0-5] | 2 [0- 4] \ d | 1 \ d \ d | [1-9]? \ D)) {3})) | :)) | (([0-9A-Fa-f] {1,4}:) { 2} (((: [0-9A-Fa-f] {1,4}) {1,5}) | ((: [0-9A-Fa-f] {1,4}) {0,3 }: ((25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ D) (\. (25 [0-5] | 2 [0- 4] \ d | 1 \ d \ d | [1-9]? \ D)) {3})) | :)) | (([0-9A-Fa-f] {1,4}:) { 1} (((: [0-9A-Fa-f] {1,4}) {1,6}) | ((: [0-9A-Fa-f] {1,4}) {0,4 }:((25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ D) (\. (25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ d)) {3})) | :)) | (: (((: [0-9A-Fa-f] {1,4}) {1,7}) | ((: [0-9A-Fa-f] {1,4}) {0,5}: ((25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ d) (\. (25 [0-5] | 2 [0-4] \ d | 1 \ d \ d | [1-9]? \ d)) {3})) |:))) (%. +)?
IPV4 (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
HOST %{HOSTNAME}
IPORHOST (?:%{HOSTNAME}|%{IP})
HOSTPORT (?:%{IPORHOST=~/\./}:%{POSINT})

# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (?>/(?>[\w_%!$@:.,-]+|\\.)*)+
TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn’t turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*’|(){},~@#%&/=:;_?\-\[\]]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])

# Days: Monday, Tue, Thu, etc…
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)

# Years?
YEAR (?>\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
# ’60′ is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5][0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}

# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG (?:[\w._/%-]+)
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{NONNEGINT:facility}.%{NONNEGINT:priority}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}

# Shortcuts
QS %{QUOTEDSTRING}

# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] “(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})” %{NUMBER:response} (?:%{NUMBER:bytes}|-)
COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}

# Log Levels
LOGLEVEL ([A-a]lert|ALERT|[T|t]race|TRACE|[D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL|[S|s]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)




Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326355797&siteId=291194637