Grok 解析elk日志

<div id="article_content" class="article_content clearfix csdn-tracking-statistics" data-pid="blog" data-mod="popu_307" data-dsm="post">
                                <div class="article-copyright">
                    版权声明:本文为博主原创文章,未经博主允许禁止转载(http://blog.csdn.net/napoay)                    https://blog.csdn.net/napoay/article/details/62885899                </div>
                                            <div class="markdown_views">
                            <!-- flowchart 箭头图标 勿删 -->
                            <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
                            <h1 id="一简介"><a name="t0"></a><strong>一、简介</strong></h1>

<blockquote>
  <p>Grok是迄今为止使蹩脚的、无结构的日志结构化和可查询的最好方式。Grok在解析 syslog logs、apache and other webserver logs、mysql logs等任意格式的文件上表现完美。</p>
</blockquote>

<p>Grok内置了120多种的正则表达式库,地址:<a href="https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns" rel="nofollow" target="_blank">https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns</a>。</p>

<h1 id="二入门例子"><a name="t1"></a><strong>二、入门例子</strong></h1>

<p>下面是一条tomcat日志:</p>

<pre class="prettyprint" name="code"><code class="hljs 1c has-numbering"><span class="hljs-number">83.149</span>.<span class="hljs-number">9.216</span> - - [<span class="hljs-number">04</span>/Jan/<span class="hljs-number">2015</span>:<span class="hljs-number">05</span>:<span class="hljs-number">13</span>:<span class="hljs-number">42</span> +<span class="hljs-number">0000</span>] <span class="hljs-string">"GET /presentations/logstash-monitorama-2013/images/kibana-search.png</span>
HTTP/<span class="hljs-number">1.1</span><span class="hljs-string">" 200 203023 "</span>http:<span class="hljs-comment">//semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel</span>
Mac OS X <span class="hljs-number">10</span>_9_1) AppleWebKit/<span class="hljs-number">537.36</span> (KHTML, like Gecko) Chrome/<span class="hljs-number">32.0</span>.<span class="hljs-number">1700.77</span> Safari/<span class="hljs-number">537.36</span><span class="hljs-string">"</span>
</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li></ul></pre>

<p>从filebeat中输出到logstash,配置如下:</p>

<pre class="prettyprint" name="code"><code class="hljs php has-numbering">input {
    beats {
        port =&gt; <span class="hljs-string">"5043"</span>
    }
}
filter {
    grok {
        match =&gt; { <span class="hljs-string">"message"</span> =&gt; <span class="hljs-string">"%{COMBINEDAPACHELOG}"</span>}
    }
}
output {
    stdout { codec =&gt; rubydebug }
}</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li></ul></pre>

<p>fileter中的<code>message</code>代表一条一条的日志,<code>%{COMBINEDAPACHELOG}</code>代表解析日志的正则表达式,COMBINEDAPACHELOG的具体内容见:<a href="https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/httpd" rel="nofollow" target="_blank">https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/httpd</a>。解析后:</p>

<pre class="prettyprint" name="code"><code class="hljs php has-numbering">{
        <span class="hljs-string">"request"</span> =&gt; <span class="hljs-string">"/presentations/logstash-monitorama-2013/images/kibana-search.png"</span>,
          <span class="hljs-string">"agent"</span> =&gt; <span class="hljs-string">"\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""</span>,
         <span class="hljs-string">"offset"</span> =&gt; <span class="hljs-number">325</span>,
           <span class="hljs-string">"auth"</span> =&gt; <span class="hljs-string">"-"</span>,
          <span class="hljs-string">"ident"</span> =&gt; <span class="hljs-string">"-"</span>,
     <span class="hljs-string">"input_type"</span> =&gt; <span class="hljs-string">"log"</span>,
           <span class="hljs-string">"verb"</span> =&gt; <span class="hljs-string">"GET"</span>,
         <span class="hljs-string">"source"</span> =&gt; <span class="hljs-string">"/path/to/file/logstash-tutorial.log"</span>,
        <span class="hljs-string">"message"</span> =&gt; <span class="hljs-string">"83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""</span>,
           <span class="hljs-string">"type"</span> =&gt; <span class="hljs-string">"log"</span>,
           <span class="hljs-string">"tags"</span> =&gt; [
        [<span class="hljs-number">0</span>] <span class="hljs-string">"beats_input_codec_plain_applied"</span>
    ],
       <span class="hljs-string">"referrer"</span> =&gt; <span class="hljs-string">"\"http://semicomplete.com/presentations/logstash-monitorama-2013/\""</span>,
     <span class="hljs-string">"@timestamp"</span> =&gt; <span class="hljs-number">2016</span>-<span class="hljs-number">10</span>-<span class="hljs-number">11</span>T21:<span class="hljs-number">04</span>:<span class="hljs-number">36.167</span>Z,
       <span class="hljs-string">"response"</span> =&gt; <span class="hljs-string">"200"</span>,
          <span class="hljs-string">"bytes"</span> =&gt; <span class="hljs-string">"203023"</span>,
       <span class="hljs-string">"clientip"</span> =&gt; <span class="hljs-string">"83.149.9.216"</span>,
       <span class="hljs-string">"@version"</span> =&gt; <span class="hljs-string">"1"</span>,
           <span class="hljs-string">"beat"</span> =&gt; {
        <span class="hljs-string">"hostname"</span> =&gt; <span class="hljs-string">"My-MacBook-Pro.local"</span>,
            <span class="hljs-string">"name"</span> =&gt; <span class="hljs-string">"My-MacBook-Pro.local"</span>
    },
           <span class="hljs-string">"host"</span> =&gt; <span class="hljs-string">"My-MacBook-Pro.local"</span>,
    <span class="hljs-string">"httpversion"</span> =&gt; <span class="hljs-string">"1.1"</span>,
      <span class="hljs-string">"timestamp"</span> =&gt; <span class="hljs-string">"04/Jan/2015:05:13:42 +0000"</span>
}</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li><li style="color: rgb(153, 153, 153);">8</li><li style="color: rgb(153, 153, 153);">9</li><li style="color: rgb(153, 153, 153);">10</li><li style="color: rgb(153, 153, 153);">11</li><li style="color: rgb(153, 153, 153);">12</li><li style="color: rgb(153, 153, 153);">13</li><li style="color: rgb(153, 153, 153);">14</li><li style="color: rgb(153, 153, 153);">15</li><li style="color: rgb(153, 153, 153);">16</li><li style="color: rgb(153, 153, 153);">17</li><li style="color: rgb(153, 153, 153);">18</li><li style="color: rgb(153, 153, 153);">19</li><li style="color: rgb(153, 153, 153);">20</li><li style="color: rgb(153, 153, 153);">21</li><li style="color: rgb(153, 153, 153);">22</li><li style="color: rgb(153, 153, 153);">23</li><li style="color: rgb(153, 153, 153);">24</li><li style="color: rgb(153, 153, 153);">25</li><li style="color: rgb(153, 153, 153);">26</li><li style="color: rgb(153, 153, 153);">27</li><li style="color: rgb(153, 153, 153);">28</li></ul></pre>

<p>再比如,下面这条日志:</p>

<pre class="prettyprint" name="code"><code class="hljs vbnet has-numbering"><span class="hljs-number">55.3</span><span class="hljs-number">.244</span><span class="hljs-number">.1</span> <span class="hljs-keyword">GET</span> /index.html <span class="hljs-number">15824</span> <span class="hljs-number">0.043</span></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li></ul></pre>

<p>这条日志可切分为5个部分,<code>IP(55.3.244.1)</code>、<code>方法(GET)</code>、<code>请求文件路径(/index.html)</code>、<code>字节数(15824)</code>、<code>访问时长(0.043)</code>,对这条日志的解析模式(正则表达式匹配)如下:</p>

<pre class="prettyprint" name="code"><code class="hljs css has-numbering">%<span class="hljs-rules">{<span class="hljs-rule"><span class="hljs-attribute">IP</span>:<span class="hljs-value">client</span></span></span>} %<span class="hljs-rules">{<span class="hljs-rule"><span class="hljs-attribute">WORD</span>:<span class="hljs-value">method</span></span></span>} %<span class="hljs-rules">{<span class="hljs-rule"><span class="hljs-attribute">URIPATHPARAM</span>:<span class="hljs-value">request</span></span></span>} %<span class="hljs-rules">{<span class="hljs-rule"><span class="hljs-attribute">NUMBER</span>:<span class="hljs-value">bytes</span></span></span>} %<span class="hljs-rules">{<span class="hljs-rule"><span class="hljs-attribute">NUMBER</span>:<span class="hljs-value">duration</span></span></span>}</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li></ul></pre>

<p>写到filter中:</p>

<pre class="prettyprint" name="code"><code class="hljs php has-numbering">filter {
    grok {
        match =&gt; { <span class="hljs-string">"message"</span> =&gt; <span class="hljs-string">"%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"</span>}
    }
}</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li></ul></pre>

<p>解析后:</p>

<pre class="prettyprint" name="code"><code class="hljs oxygene has-numbering">client: <span class="hljs-number">55.3</span>.<span class="hljs-number">244.1</span>
<span class="hljs-function"><span class="hljs-keyword">method</span>:</span> GET
request: /<span class="hljs-keyword">index</span>.html
bytes: <span class="hljs-number">15824</span>
duration: <span class="hljs-number">0.043</span></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li></ul></pre>

<h1 id="三解析任意格式日志"><a name="t2"></a><strong>三、解析任意格式日志</strong></h1>

<p>解析任意格式日志的步骤:</p>

<ol>
<li>先确定日志的切分原则,也就是一条日志切分成几个部分。</li>
<li>对每一块进行分析,如果Grok中正则满足需求,直接拿来用。如果Grok中没用现成的,采用自定义模式。</li>
<li>学会在<a href="http://grokdebug.herokuapp.com/" rel="nofollow" target="_blank">Grok  Debugger</a>中调试。</li>
</ol>

<p>下面给出例子,来两条日志:</p>

<pre class="prettyprint" name="code"><code class="hljs css has-numbering">2017<span class="hljs-tag">-03-07</span> 00<span class="hljs-pseudo">:03</span><span class="hljs-pseudo">:44</span>,373 4191949560 <span class="hljs-attr_selector">[          CASFilter.java:330:DEBUG]</span>  <span class="hljs-tag">entering</span> <span class="hljs-tag">doFilter</span>()

2017<span class="hljs-tag">-03-16</span> 00<span class="hljs-pseudo">:00</span><span class="hljs-pseudo">:01</span>,641 133383049 <span class="hljs-attr_selector">[    UploadFileModel.java:234:INFO ]</span>  上报内容准备写入文件</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li></ul></pre>

<p>切分原则:</p>

<pre class="prettyprint" name="code"><code class="hljs css has-numbering">2017<span class="hljs-tag">-03-16</span> 00<span class="hljs-pseudo">:00</span><span class="hljs-pseudo">:01</span>,641:时间
133383049:编号
<span class="hljs-tag">UploadFileModel</span><span class="hljs-class">.java</span><span class="hljs-pseudo">:java</span>类名
234:代码行号
<span class="hljs-tag">INFO</span>:日志级别
<span class="hljs-tag">entering</span> <span class="hljs-tag">doFilter</span>():日志内容
</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li><li style="color: rgb(153, 153, 153);">2</li><li style="color: rgb(153, 153, 153);">3</li><li style="color: rgb(153, 153, 153);">4</li><li style="color: rgb(153, 153, 153);">5</li><li style="color: rgb(153, 153, 153);">6</li><li style="color: rgb(153, 153, 153);">7</li></ul></pre>

<p>前五个字段用Grok中已有的,分别是<code>TIMESTAMP_ISO8601</code>、<code>NUMBER</code>、<code>JAVAFILE</code>、<code>NUMBER</code>、<code>LOGLEVEL</code>,最后一个采用自定义正则的形式,日志级别的]之后的内容不论是中英文,都作为日志信息处理,使用自定义正则表达式子的规则如下:</p>

<pre class="prettyprint" name="code"><code class="hljs clojure has-numbering"><span class="hljs-list">(<span class="hljs-title">?&lt;field_name&gt;the</span> pattern here)</span></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li></ul></pre>

<p>最后一个字段的内容用info表示,正则如下:</p>

<pre class="prettyprint" name="code"><code class="hljs clojure has-numbering"><span class="hljs-list">(<span class="hljs-title">?&lt;info&gt;</span><span class="hljs-list">(<span class="hljs-collection">[\s\S]</span>*)</span>)</span></code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li></ul></pre>

<p>上面两条日志对应的完整的正则如下,其中<code>\s*</code>用于剔除空格。</p>

<pre class="prettyprint" name="code"><code class="hljs ruby has-numbering">\s*<span class="hljs-string">%{TIMESTAMP_ISO8601:time}</span>\s*<span class="hljs-string">%{NUMBER:num}</span> \[\s*<span class="hljs-string">%{JAVAFILE:class}</span>\s*\<span class="hljs-symbol">:</span>\s*<span class="hljs-string">%{NUMBER:lineNumber}</span>\s*\<span class="hljs-symbol">:<span class="hljs-string">%{LOGLEVEL:level}</span></span>\s*\]\s*(?&lt;info&gt;([\s\<span class="hljs-constant">S</span>]*))</code><ul class="pre-numbering" style=""><li style="color: rgb(153, 153, 153);">1</li></ul></pre>

<p>正则解析容易出错,强烈建议使用Grok Debugger调试,姿势如下。</p>

<p><img src="https://img-blog.csdn.net/20170317132230492?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvbmFwb2F5/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="这里写图片描述" title=""></p>

<h1 id="四参考资料"><a name="t3"></a>四、参考资料</h1>

<ol>
<li><a href="https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html" rel="nofollow" target="_blank">plugins-filters-grok</a></li>
<li><a href="https://www.elastic.co/guide/en/logstash/current/advanced-pipeline.html" rel="nofollow" target="_blank">Parsing Logs with Logstash</a></li>
</ol>            </div>
                        <link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-8cccb36679.css" rel="stylesheet">
                </div>

猜你喜欢

转载自blog.csdn.net/liyaohui_szz/article/details/82834506