Logstash filter use

Outline

logstash were strong and popular, with its wealth of filter plug-ins are inseparable

Filter provides not just the filter function can also be complicated logic for processing the raw data entering the filter, and even add a new event unique to a subsequent process

 

grok is a very powerful logstash filter plug-ins, he can parse text in any format, he's the best way to unstructured log data currently logstash parse

 

 

Basic Usage

 

Grok grammar rules are:

% {Syntax: semantics}

"Grammar" refers to the pattern matching, for example using NUMBER pattern can match the digital, IP will match the IP address 127.0.0.1 this:

%{NUMBER:lasttime}%{IP:client}

By default, all the "semantic" are saved as a string, you can also add to the data type conversion

%{NUMBER:lasttime:int}%{IP:client}

Currently only supports conversion type int and float

 

 

Coverage - overwrite

 

Use Grok overwrite parameter information may be covered log

Copy the code
filter {
    grok {
        match => { "message" => "%{SYSLOGBASE} %{DATA:message}" }
        overwrite => [ "message" ]
    }
}
Copy the code

message in the log field will be covered

 

 

Examples

 

For the following log, is in fact an HTTP request line:

 

55.3.244.1 GET /index.html 15824 0.043

Logstash we can use the following configuration:

Copy the code
input {
file {
path => "/var/log/http.log"
}
}
filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
}
}
Copy the code

 

Collect the results can be seen:

client: 55.3.244.1
method: GET
request: /index.html
bytes: 15824
duration: 0.043

The data structure is realized without a structured output in such a way

 

grok is implemented on the basis of regular expressions (using Oniguruma library), so he can resolve any regular expression

 

 

Creation mode

 

Log field extraction and regular expression extraction rule fields as:

(?<field_name>the pattern here)

First, create a schema file, you need to write regular expression:

# contents of ./patterns/postfix:
POSTFIX_QUEUEID [0-9A-F]{10,11}

Then configure your Logstash:

Copy the code
filter {
    grok {
        patterns_dir => "./patterns"
            match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }
    }
}
Copy the code

For log:

Jan  1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=<[email protected]>

It can be matched:

timestamp: Jan 1 06:25:43

LogSource: mailserver14

program: postfix/cleanup

pid: 21403

queue_id: BEF25A72965

syslog_message: message-id=<[email protected]>

 

IP position plug - Geoip

A message Logstash 1.3.0 above may be used to obtain an IP corresponding plug-geoip location, for statistical accesslog like it, is a very useful source IP

 

 

Instructions

geoip {
    source => ...
}

Examples

filter {
    geoip {
        source => "message"
    }
}

operation result:

Copy the code
{
    "message" => "183.60.92.253",
    "@version" => "1",
    "@timestamp" => "2014-08-07T10:32:55.610Z",
    "host" => "raochenlindeMacBook-Air.local",
    "geoip" => {
        "ip" => "183.60.92.253",
        "country_code2" => "CN",
        "country_code3" => "CHN",
        "country_name" => "China",
        "continent_code" => "AS",
        "region_name" => "30",
        "city_name" => "Guangzhou",
        "latitude" => 23.11670000000001,
        "longitude" => 113.25,
        "timezone" => "Asia/Chongqing",
        "real_region_name" => "Guangdong",
        "location" => [
            [0] 113.25,
            [1] 23.11670000000001
        ]
    }
}
Copy the code

We can see, logstash by extracting to the message field of IP, to resolve a series of information related to geographic locations

Of course, for many data parsed, you can filter by fields option

Copy the code
filter {
    geoip {
        fields => ["city_name", "continent_code", "country_code2", "country_code3", "country_name", "dma_code", "ip", "latitude", "longitude", "postal_code", "region_name", "timezone"]
    }
}
Copy the code

Options

 

Above we see the source and fields two options, geoip also provides the following options:

Selectable options offered geoip
Options Types of Do you have to Defaults significance
add_field hash no {} Add a field for the current event
add_tag array no [] A tag that identifies the increase for the current event
database path no no Position information repository resides file
fields array no no Some of the fields in the screening results in return geoip
lru_cashe_size int 1000 geoip occupied cache size
periodic_flush bool no false Whether called periodically refresh party e
remove_field array no [] Remove a field from the results
remove_tag array no [] Remove a tag from the result
source string Yes no There needs to resolve the IP field name
target string no "geoip" Back to the results of the analysis result stored geoip field name

 

json

For log json format can be resolved by json encoding codec, but only if the record is part of json, this time you need to use json decoding plug-in filter in

Examples

Copy the code
filter {
    json {
        source => "message"
        target => "jsoncontent"
    }
}
Copy the code

operation result:

Copy the code
{
    "@version": "1",
    "@timestamp": "2014-11-18T08:11:33.000Z",
    "host": "web121.mweibo.tc.sinanode.com",
    "message": "{\"uid\":3081609001,\"type\":\"signal\"}",
    "jsoncontent": {
        "uid": 3081609001,
        "type": "signal"
    }
}
Copy the code

In the above example, the analysis result is placed under the target node pointed, if desired with the log analysis results in other fields is maintained at the same output level, then only need to remove the target:

Copy the code
{
    "@version": "1",
    "@timestamp": "2014-11-18T08:11:33.000Z",
    "host": "web121.mweibo.tc.sinanode.com",
    "message": "{\"uid\":3081609001,\"type\":\"signal\"}",
    "uid": 3081609001,
    "type": "signal"
}
Copy the code

Time division - split

mutiline let logstash the multiple rows of data into an event, of course, also supports the logstash line data becomes more events

logstash provides a split plug, used to split a line of data into multiple events

 

Example:

Copy the code
filter {
    split {
        field => "message"
        terminator => "#"
    }
}
Copy the code

operation result:

For "test1 # test2", above logstash configured to turn it into the following two events:

Copy the code
{
    "@version": "1",
    "@timestamp": "2014-11-18T08:11:33.000Z",
    "host": "web121.mweibo.tc.sinanode.com",
    "message": "test1"
}
{
    "@version": "1",
    "@timestamp": "2014-11-18T08:11:33.000Z",
    "host": "web121.mweibo.tc.sinanode.com",
    "message": "test2"
}
Copy the code

 

We note that, when the end of the split plug-in performs, directly into the output stage, all subsequent filter will not be executed

 

logstash also supports changes to the data in the event of the filter

 

 

Rename - rename

 

For existing fields, renaming its field name

filter {
    mutate {
        rename => ["syslog_host", "host"]
    }
}

Update field content - update

 

Update field content, if the field does not exist, not new

filter {
    mutate {
        update => { "sample" => "My new message" }
    }
}

Replace field contents - replace

 

Update the same function, except that it will create a new field if the field is not present

filter {
    mutate {
        replace => { "message" => "%{source_host}: My new message" }
    }
}

Data type conversion - convert

filter {
    mutate {
        convert => ["request_time", "float"]
    }
}

Text replacement - gsub

 

gsub offers text replacement function is achieved by regular expressions

Copy the code
filter {
    mutate {
        gsub => [
            # replace all forward slashes with underscore
            "fieldname", "/", "_",
            # replace backslashes, question marks, hashes, and minuses
            # with a dot "."
            "fieldname2", "[\\?#-]", "."
        ]
    }
}
Copy the code

Case conversion - uppercase, lowercase

filter {
    mutate {
        uppercase => [ "fieldname" ]
    }
}

Remove whitespace characters - strip

 

Similar in php trim, remove the head and tail only whitespace

filter {
    mutate {
        strip => ["field1", "field2"]
    }
}

Delete field - remove, remove_field

 

remove not recommended, recommended remove_field

filter {
    mutate {
        remove_field => [ "foo_%{somefield}" ]
    }
}

Delete field - remove, remove_field

 

remove not recommended, recommended remove_field

filter {
    mutate {
        remove_field => [ "foo_%{somefield}" ]
    }
}

Split field - split

 

The extracted a field divided in accordance with a character

filter {
    mutate {
        split => ["message", "|"]
    }
}

针对字符串 "123|321|adfd|dfjld*=123",可以看到输出结果:

Copy the code
{
    "message" => [
        [0] "123",
        [1] "321",
        [2] "adfd",
        [3] "dfjld*=123"
    ],
    "@version" => "1",
    "@timestamp" => "2014-08-20T15:58:23.120Z",
    "host" => "raochenlindeMacBook-Air.local"
}
Copy the code

聚合数组 -- join

 

将类型为 array 的字段中的 array 元素使用指定字符为分隔符聚合成一个字符串

 

如我们可以将 split 分割的结果再重新聚合起来:

Copy the code
filter {
    mutate {
        split => ["message", "|"]
    }
    mutate {
        join => ["message", ","]
    }
}
Copy the code

输出:

Copy the code
{
    "message" => "123,321,adfd,dfjld*=123",
    "@version" => "1",
    "@timestamp" => "2014-08-20T16:01:33.972Z",
    "host" => "raochenlindeMacBook-Air.local"
}
Copy the code

合并数组 -- merge

 

对于几个类型为 array 或 hash 或 string 的字段,我们可以使用 merge 合并

filter {
    mutate {
        merge => [ "dest_field", "added_field" ]
    }
}

It should be noted, array and hash merge the two fields are not

 

Guess you like

Origin www.cnblogs.com/ExMan/p/11960459.html