Logstash 总结
官网:https://www.elastic.co/products/logstash
概述
Logstash 主要收集一些数据并把数据保存到数据库,或者文件中或者其他的介质中。通过FileWatch的Ruby Gem库来监听文件的变化。Logastash会每个15s的时间去监听问价的变化情况。
特点
Logstash收集日志的能力很强轻,不限日志来源,不限日志形式,对系统监控,问题分析应该很好,而且对技术要求也不高,实现的语言是ruby,内部有sincedb数据库文件来监听日志文件的当前读取位置。不必担心Logstash会漏过你的数据。
执行流程
安装Logstash
1-1)、安装Logstash
[root@hadoop1 bin]# tar -zxvf logstash-2.3.1.tar.gz
[root@hadoop1 logstash]# cd logstash-2.3.1/
[root@hadoop1 logstash-2.3.1]# cd bin/
1-2)、Logstash查看帮助
[root@hadoop1 bin]# ./logstash -help
Usage:
/bin/logstash agent [OPTIONS]
Options:
-f, --config CONFIG_PATH Load the logstash config from a specific file
or directory. If a directory is given, all
files in that directory will be concatenated
in lexicographical order and then parsed as a
single config file. You can also specify
wildcards (globs) and any matched files will
be loaded in the order described above.
-e CONFIG_STRING Use the given string as the configuration
data. Same syntax as the config file. If no
input is specified, then the following is
used as the default input:
"input { stdin { type => stdin } }"
and if no output is specified, then the
following is used as the default output:
"output { stdout { codec => rubydebug } }"
If you wish to use both defaults, please use
the empty string for the '-e' flag.
(default: "")
-w, --pipeline-workers COUNT Sets the number of pipeline workers to run.
(default: 1)
-b, --pipeline-batch-size SIZE Size of batches the pipeline is to work in.
(default: 125)
-u, --pipeline-batch-delay DELAY_IN_MS When creating pipeline batches, how long to wait while polling
for the next event.
(default: 5)
--filterworkers COUNT DEPRECATED. Now an alias for --pipeline-workers and -w
-l, --log FILE Write logstash internal logs to the given
file. Without this flag, logstash will emit
logs to standard output.
-v Increase verbosity of logstash internal logs.
Specifying once will show 'informational'
logs. Specifying twice will show 'debug'
logs. This flag is deprecated. You should use
--verbose or --debug instead.
--quiet Quieter logstash logging. This causes only
errors to be emitted.
--verbose More verbose logging. This causes 'info'
level logs to be emitted.
--debug Most verbose logging. This causes 'debug'
level logs to be emitted.
--debug-config translation missing: en.logstash.runner.flag.debug_config (default: false)
-V, --version Emit the version of logstash and its friends,
then exit.
-p, --pluginpath PATH A path of where to find plugins. This flag
can be given multiple times to include
multiple paths. Plugins are expected to be
in a specific directory hierarchy:
'PATH/logstash/TYPE/NAME.rb' where TYPE is
'inputs' 'filters', 'outputs' or 'codecs'
and NAME is the name of the plugin.
-t, --configtest Check configuration for valid syntax and then exit.
--[no-]allow-unsafe-shutdown Force logstash to exit during shutdown even
if there are still inflight events in memory.
By default, logstash will refuse to quit until all
received events have been pushed to the outputs. (default: false)
-r, --[no-]auto-reload Monitor configuration changes and reload
whenever it is changed.
NOTE: use SIGHUP to manually reload the config
(default: false)
--reload-interval RELOAD_INTERVAL How frequently to poll the configuration location
for changes, in seconds.
(default: 3)
--allow-env EXPERIMENTAL. Enables templating of environment variable
values. Instances of "${VAR}" in strings will be replaced
with the respective environment variable value named "VAR".
(default: false)
-h, --help print help
Kafka 实例
[root@hadoop1 bin]# ./logstash -f flow-kafka.conf
Settings: Default pipeline workers: 1
Pipeline main started
[root@hadoop1 bin]# ./logstash -f kafka-es.conf
Settings: Default pipeline workers: 1
Pipeline main started
[root@hadoop1 ~]# kafka-console-consumer.sh --zookeeper hadoop1:2181 --from-beginning --topic folwKafka
1472703222.048 192.168.215.1 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
1472703222.073 192.168.215.1 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
配置文件实例
1-1)、输入实例
[root@hadoop1 bin]# ./logstash -e 'input{stdin{}}output{stdout{codec=>rubydebug}}'
Hello World
Settings: Default pipeline workers: 1
Pipeline main started
{
"message" => "Hello World",
"@version" => "1",
"@timestamp" => "2016-10-07T14:33:43.530Z",
"host" => "hadoop1"
}
1-2)、收集数据到Elasticsearch
[root@hadoop1 bin]# cat flow-es.conf
input {
file{
type => "logstash_es"
path => "/usr/local/nginx/logs/access.log"
# 发现新数据的间隔
discover_interval => 1
# 采集方式
start_position => "beginning"
}
}
output {
elasticsearch {
index => "logstash-es-%{+YYYY.MM.dd}"
hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]
}
}
exclude 不想被监听的文件可以排除出去。
close_older 已经监听的文件,若超过这个时间内没有更新,就关闭监听该文件的句柄。默认为:3600s,即一小时。
ignore_older 在每次检查文件列表时,若文件的最后修改时间超过该值,则忽略该文件。默认为:86400s,即一天。
sincedb_path 定义 .sincedb 文件路径,默认为 $HOME/.sincedb 。
sincedb_write_interval 间隔多久写一次sincedb文件,默认15s。
stat_interval 每隔多久检查被监听文件状态(是否有更新),默认为1s。
start_position logstash从什么位置开始读取文件数据。默认为结束位置,类似 tail -f 的形式。设置为“beginning”,则从头读取,类似 cat ,到最后一行以后变成为 tail -f 。
1-3)、收集数据到Kafka
[root@hadoop1 bin]# cat flow-kafka.conf
input {
file {
path => "/usr/local/nginx/logs/access.log"
discover_interval => 5
start_position => "beginning"
}
}
output{
kafka {
topic_id => "logTest"
codec => plain {
format => "%{message}"
}
bootstrap_servers => "hadoop1:9092,hadoop2:9092,hadoop3:9092"
}
}
1-4)、Kafka收集数据到Elasticsearch
[root@hadoop1 bin]# cat kafka-es.conf
input {
kafka {
type => "kafka-es"
auto_offset_reset => "smallest"
codec => plain {
charset => "GB2312"
}
group_id => "es"
topic_id => "logTest"
zk_connect => "hadoop1:2181,hadoop2:2181,hadoop3:2181"
}
}
filter {
mutate {
split => { "message" => " " }
add_field => {
"ip_address" => "%{message[2]}"
"liulanqi" => "%{message[3]}"
}
remove_field => [ "message" ]
}
}
output {
elasticsearch {
index => "kafka-flow-es-%{+YYYY.MM.dd}"
codec => plain {
charset => "GB2312"
}
hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]
}
}
1-5)、其他的配置
input {
kafka {
type => "accesslogs"
codec => "plain"
auto_offset_reset => "smallest"
group_id => "elas1"
topic_id => "accesslogs"
zk_connect => "hadoop1:2181,hadoop2:2181,hadoop3:2181"
}
kafka {
type => "gamelogs"
auto_offset_reset => "smallest"
codec => "plain"
group_id => "elas2"
topic_id => "gamelogs"
zk_connect => "hadoop1:2181,hadoop2:2181,hadoop3:2181"
}
}
filter {
if [type] == "accesslogs" {
json {
source => "message"
remove_field => [ "message" ]
target => "access"
}
}
if [type] == "gamelogs" {
mutate {
split => { "message" => " " }
add_field => {
"event_type" => "%{message[3]}"
"current_map" => "%{message[4]}"
"current_X" => "%{message[5]}"
"current_y" => "%{message[6]}"
"user" => "%{message[7]}"
"item" => "%{message[8]}"
"item_id" => "%{message[9]}"
"current_time" => "%{message[12]}"
}
remove_field => [ "message" ]
}
}
}
output {
if [type] == "accesslogs" {
elasticsearch {
index => "accesslogs"
codec => "json"
hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]
}
}
if [type] == "gamelogs" {
elasticsearch {
index => "gamelogs"
codec => plain {
charset => "UTF-16BE"
}
hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]
}
}
}
#整个配置文件分为三部分:input,filter,output
#参考这里的介绍 https://www.elastic.co/guide/en/logstash/current/configuration-file-structure.html
input {
#file可以多次使用,也可以只写一个file而设置它的path属性配置多个文件实现多文件监控
file {
#type是给结果增加了一个属性叫type值为"<xxx>"的条目。这里的type,对应了ES中index中的type,即如果输入ES时,没有指定type,那么这里的type将作为ES中index的type。
type => "apache-access"
path => "/apphome/ptc/Windchill_10.0/Apache/logs/access_log*"
#start_position可以设置为beginning或者end,beginning表示从头开始读取文件,end表示读取最新的,这个也要和ignore_older一起使用。
start_position => beginning
#sincedb_path表示文件读取进度的记录,每行表示一个文件,每行有两个数字,第一个表示文件的inode,第二个表示文件读取到的位置(byteoffset)。默认为$HOME/.sincedb*
sincedb_path => "/opt/logstash-2.3.1/sincedb_path/access_progress"
#ignore_older表示了针对多久的文件进行监控,默认一天,单位为秒,可以自己定制,比如默认只读取一天内被修改的文件。
ignore_older => 604800
#add_field增加属性。这里使用了${HOSTNAME},即本机的环境变量,如果要使用本机的环境变量,那么需要在启动命令上加--alow-env。
add_field => {"log_hostname"=>"${HOSTNAME}"}
#这个值默认是\n 换行符,如果设置为空"",那么后果是每个字符代表一个event
delimiter => ""
#这个表示关闭超过(默认)3600秒后追踪文件。这个对于multiline来说特别有用。... 这个参数和logstash对文件的读取方式有关,两种方式read tail,如果是read
close_older => 3600
coodec => multiline {
pattern => "^\s"
#这个negate是否定的意思,意思跟pattern相反,也就是不满足patter的意思。
# negate => ""
#what有两个值可选 previous和next,举例说明,java的异常从第二行以空格开始,这里就可以pattern匹配空格开始,what设置为previous意思是空格开头这行跟上一行属于同一event。另一个例子,有时候一条命令太长,当以\结尾时表示这行属于跟下一行属于同一event,这时需要使用negate=>true,what=>'next'。
what => "previous"
auto_flush_interval => 60
}
}
file {
type => "methodserver-log"
path => "/apphome/ptc/Windchill_10.0/Windchill/logs/MethodServer-1604221021-32380.log"
start_position => beginning
sincedb_path => "/opt/logstash-2.3.1/sincedb_path/methodserver_process"
# ignore_older => 604800
}
}
filter{
#执行ruby程序,下面例子是将日期转化为字符串赋予daytag
ruby {
code => "event['daytag'] = event.timestamp.time.localtime.strftime('%Y-%m-%d')"
}
# if [path] =~ "access" {} else if [path] =~ "methodserver" {} else if [path] =~ "servermanager" {} else {} 注意语句结构
if [path] =~ "MethodServer" { #z这里的=~是匹配正则表达式
grok {
patterns_dir => ["/opt/logstash-2.3.1/patterns"] #自定义正则匹配
# Tue 4/12/16 14:24:17: TP-Processor2: hirecode---->77LS
match => { "message" => "%{DAY:log_weekday} %{DATE_US:log_date} %{TIME:log_time}: %{GREEDYDATA:log_data}"}
}
#mutage是做转换用的
mutate {
replace => { "type" => "apache" } #替换属性值
convert => { #类型转换
"bytes" => "integer" #例如还有float
"duration" => "integer"
"state" => "integer"
}
#date主要是用来处理文件内容中的日期的。内容中读取的是字符串,通过date将它转换为@timestamp。参考https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html#plugins-filters-date-match
# date {
# match => [ "logTime" , "dd/MMM/yyyy:HH:mm:ss Z" ]
# }
}else if [type] in ['tbg_qas','mbg_pre'] { # if ... else if ... else if ... else结构
}else {
drop{} # 将event丢弃
}
}
output {
stdout{ codec=>rubydebug} # 直接输出,调试用起来方便
# 输出到redis
redis {
host => '10.120.20.208'
data_type => 'list'
key => '10.99.201.34:access_log_2016-04'
}
# 输出到ES
elasticsearch {
hosts =>"192.168.0.15:9200"
index => "%{sysid}_%{type}"
document_type => "%{daytag}"
}
}
下面是日志的样子
55.3.244.1 GET /index.html 15824 0.043
正则的例子
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
配置文件里是怎么写得?
input {
file {
path => “/var/log/http.log”
}
}
filter {
grok {
match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ]
}
}
解析后,是个什么样子?
client: 55.3.244.1
method: GET
request: /index.html
bytes: 15824
duration: 0.043
前台启动
[root@hadoop1 bin]# ./logstash -f kafka-es.conf
后台启动
[root@hadoop1 bin]# ./logstash -f kafka-es.conf /dev/null 2>&1 &
查看ES保存的数据
http://hadoop1:9200/_plugin/head/