Outline
logstash were strong and popular, with its wealth of filter plug-ins are inseparable
Filter provides not just the filter function can also be complicated logic for processing the raw data entering the filter, and even add a new event unique to a subsequent process
Powerful text analysis tool - Grok
grok is a very powerful logstash filter plug-ins, he can parse text in any format, he's the best way to unstructured log data currently logstash parse
Basic Usage
Grok grammar rules are:
% {Syntax: semantics}
"Grammar" refers to the pattern matching, for example using NUMBER pattern can match the digital, IP will match the IP address 127.0.0.1 this:
%{NUMBER:lasttime}%{IP:client}
By default, all the "semantic" are saved as a string, you can also add to the data type conversion
%{NUMBER:lasttime:int}%{IP:client}
Currently only supports conversion type int and float
Coverage - overwrite
Use Grok overwrite parameter information may be covered log
filter { grok { match => { "message" => "%{SYSLOGBASE} %{DATA:message}" } overwrite => [ "message" ] } }
message in the log field will be covered
Examples
For the following log, is in fact an HTTP request line:
55.3.244.1 GET /index.html 15824 0.043
Logstash we can use the following configuration:
input { file { path => "/var/log/http.log" } } filter { grok { match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" } } }
Collect the results can be seen:
client: 55.3.244.1 method: GET request: /index.html bytes: 15824 duration: 0.043
The data structure is realized without a structured output in such a way
Grok using regular expressions
grok is implemented on the basis of regular expressions (using Oniguruma library), so he can resolve any regular expression
Creation mode
Log field extraction and regular expression extraction rule fields as:
(?<field_name>the pattern here)
First, create a schema file, you need to write regular expression:
# contents of ./patterns/postfix: POSTFIX_QUEUEID [0-9A-F]{10,11}
Then configure your Logstash:
filter { grok { patterns_dir => "./patterns" match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" } } }
For log:
Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: message-id=<[email protected]>
It can be matched:
timestamp: Jan 1 06:25:43
LogSource: mailserver14
program: postfix/cleanup
pid: 21403
queue_id: BEF25A72965
syslog_message: message-id=<[email protected]>
IP position plug - Geoip
A message Logstash 1.3.0 above may be used to obtain an IP corresponding plug-geoip location, for statistical accesslog like it, is a very useful source IP
Instructions
geoip { source => ... }
Examples
filter { geoip { source => "message" } }
operation result:
{ "message" => "183.60.92.253", "@version" => "1", "@timestamp" => "2014-08-07T10:32:55.610Z", "host" => "raochenlindeMacBook-Air.local", "geoip" => { "ip" => "183.60.92.253", "country_code2" => "CN", "country_code3" => "CHN", "country_name" => "China", "continent_code" => "AS", "region_name" => "30", "city_name" => "Guangzhou", "latitude" => 23.11670000000001, "longitude" => 113.25, "timezone" => "Asia/Chongqing", "real_region_name" => "Guangdong", "location" => [ [0] 113.25, [1] 23.11670000000001 ] } }
We can see, logstash by extracting to the message field of IP, to resolve a series of information related to geographic locations
Of course, for many data parsed, you can filter by fields option
filter { geoip { fields => ["city_name", "continent_code", "country_code2", "country_code3", "country_name", "dma_code", "ip", "latitude", "longitude", "postal_code", "region_name", "timezone"] } }
Options
Above we see the source and fields two options, geoip also provides the following options:
Options | Types of | Do you have to | Defaults | significance |
add_field | hash | no | {} | Add a field for the current event |
add_tag | array | no | [] | A tag that identifies the increase for the current event |
database | path | no | no | Position information repository resides file |
fields | array | no | no | Some of the fields in the screening results in return geoip |
lru_cashe_size | int | 1000 | geoip occupied cache size | |
periodic_flush | bool | no | false | Whether called periodically refresh party e |
remove_field | array | no | [] | Remove a field from the results |
remove_tag | array | no | [] | Remove a tag from the result |
source | string | Yes | no | There needs to resolve the IP field name |
target | string | no | "geoip" | Back to the results of the analysis result stored geoip field name |
json
For log json format can be resolved by json encoding codec, but only if the record is part of json, this time you need to use json decoding plug-in filter in
Examples
filter { json { source => "message" target => "jsoncontent" } }
operation result:
{ "@version": "1", "@timestamp": "2014-11-18T08:11:33.000Z", "host": "web121.mweibo.tc.sinanode.com", "message": "{\"uid\":3081609001,\"type\":\"signal\"}", "jsoncontent": { "uid": 3081609001, "type": "signal" } }
In the above example, the analysis result is placed under the target node pointed, if desired with the log analysis results in other fields is maintained at the same output level, then only need to remove the target:
{ "@version": "1", "@timestamp": "2014-11-18T08:11:33.000Z", "host": "web121.mweibo.tc.sinanode.com", "message": "{\"uid\":3081609001,\"type\":\"signal\"}", "uid": 3081609001, "type": "signal" }
Time division - split
mutiline let logstash the multiple rows of data into an event, of course, also supports the logstash line data becomes more events
logstash provides a split plug, used to split a line of data into multiple events
Example:
filter { split { field => "message" terminator => "#" } }
operation result:
For "test1 # test2", above logstash configured to turn it into the following two events:
{ "@version": "1", "@timestamp": "2014-11-18T08:11:33.000Z", "host": "web121.mweibo.tc.sinanode.com", "message": "test1" } { "@version": "1", "@timestamp": "2014-11-18T08:11:33.000Z", "host": "web121.mweibo.tc.sinanode.com", "message": "test2" }
We note that, when the end of the split plug-in performs, directly into the output stage, all subsequent filter will not be executed
Data Modification - mutate
logstash also supports changes to the data in the event of the filter
Rename - rename
For existing fields, renaming its field name
filter { mutate { rename => ["syslog_host", "host"] } }
Update field content - update
Update field content, if the field does not exist, not new
filter { mutate { update => { "sample" => "My new message" } } }
Replace field contents - replace
Update the same function, except that it will create a new field if the field is not present
filter { mutate { replace => { "message" => "%{source_host}: My new message" } } }
Data type conversion - convert
filter { mutate { convert => ["request_time", "float"] } }
Text replacement - gsub
gsub offers text replacement function is achieved by regular expressions
filter { mutate { gsub => [ # replace all forward slashes with underscore "fieldname", "/", "_", # replace backslashes, question marks, hashes, and minuses # with a dot "." "fieldname2", "[\\?#-]", "." ] } }
Case conversion - uppercase, lowercase
filter { mutate { uppercase => [ "fieldname" ] } }
Remove whitespace characters - strip
Similar in php trim, remove the head and tail only whitespace
filter { mutate { strip => ["field1", "field2"] } }
Delete field - remove, remove_field
remove not recommended, recommended remove_field
filter { mutate { remove_field => [ "foo_%{somefield}" ] } }
Delete field - remove, remove_field
remove not recommended, recommended remove_field
filter { mutate { remove_field => [ "foo_%{somefield}" ] } }
Split field - split
The extracted a field divided in accordance with a character
filter { mutate { split => ["message", "|"] } }
针对字符串 "123|321|adfd|dfjld*=123",可以看到输出结果:
{ "message" => [ [0] "123", [1] "321", [2] "adfd", [3] "dfjld*=123" ], "@version" => "1", "@timestamp" => "2014-08-20T15:58:23.120Z", "host" => "raochenlindeMacBook-Air.local" }
聚合数组 -- join
将类型为 array 的字段中的 array 元素使用指定字符为分隔符聚合成一个字符串
如我们可以将 split 分割的结果再重新聚合起来:
filter { mutate { split => ["message", "|"] } mutate { join => ["message", ","] } }
输出:
{ "message" => "123,321,adfd,dfjld*=123", "@version" => "1", "@timestamp" => "2014-08-20T16:01:33.972Z", "host" => "raochenlindeMacBook-Air.local" }
合并数组 -- merge
对于几个类型为 array 或 hash 或 string 的字段,我们可以使用 merge 合并
filter { mutate { merge => [ "dest_field", "added_field" ] } }
It should be noted, array and hash merge the two fields are not