快学Big Data -- Logstash 总结(二十七)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xfg0218/article/details/82381717

Logstash 总结

官网:https://www.elastic.co/products/logstash

概述

Logstash 主要收集一些数据并把数据保存到数据库,或者文件中或者其他的介质中。通过FileWatch的Ruby Gem库来监听文件的变化。Logastash会每个15s的时间去监听问价的变化情况。

特点

 

Logstash收集日志的能力很强轻,不限日志来源,不限日志形式,对系统监控,问题分析应该很好,而且对技术要求也不高,实现的语言是ruby,内部有sincedb数据库文件来监听日志文件的当前读取位置。不必担心Logstash会漏过你的数据。

执行流程

安装Logstash

1-1)、安装Logstash

[root@hadoop1 bin]# tar -zxvf logstash-2.3.1.tar.gz

[root@hadoop1 logstash]# cd logstash-2.3.1/

[root@hadoop1 logstash-2.3.1]# cd bin/

 

1-2)、Logstash查看帮助

[root@hadoop1 bin]# ./logstash -help

Usage:

    /bin/logstash agent [OPTIONS]

 

Options:

    -f, --config CONFIG_PATH      Load the logstash config from a specific file

                                  or directory.  If a directory is given, all

                                  files in that directory will be concatenated

                                  in lexicographical order and then parsed as a

                                  single config file. You can also specify

                                  wildcards (globs) and any matched files will

                                  be loaded in the order described above.

    -e CONFIG_STRING              Use the given string as the configuration

                                  data. Same syntax as the config file. If no

                                  input is specified, then the following is

                                  used as the default input:

                                  "input { stdin { type => stdin } }"

                                  and if no output is specified, then the

                                  following is used as the default output:

                                  "output { stdout { codec => rubydebug } }"

                                  If you wish to use both defaults, please use

                                  the empty string for the '-e' flag.

                                   (default: "")

    -w, --pipeline-workers COUNT  Sets the number of pipeline workers to run.

                                   (default: 1)

    -b, --pipeline-batch-size SIZE Size of batches the pipeline is to work in.

                                   (default: 125)

    -u, --pipeline-batch-delay DELAY_IN_MS When creating pipeline batches, how long to wait while polling

                                  for the next event.

                                   (default: 5)

    --filterworkers COUNT         DEPRECATED. Now an alias for --pipeline-workers and -w

    -l, --log FILE                Write logstash internal logs to the given

                                  file. Without this flag, logstash will emit

                                  logs to standard output.

    -v                            Increase verbosity of logstash internal logs.

                                  Specifying once will show 'informational'

                                  logs. Specifying twice will show 'debug'

                                  logs. This flag is deprecated. You should use

                                  --verbose or --debug instead.

    --quiet                       Quieter logstash logging. This causes only

                                  errors to be emitted.

    --verbose                     More verbose logging. This causes 'info'

                                  level logs to be emitted.

    --debug                       Most verbose logging. This causes 'debug'

                                  level logs to be emitted.

    --debug-config                translation missing: en.logstash.runner.flag.debug_config (default: false)

    -V, --version                 Emit the version of logstash and its friends,

                                  then exit.

    -p, --pluginpath PATH         A path of where to find plugins. This flag

                                  can be given multiple times to include

                                  multiple paths. Plugins are expected to be

                                  in a specific directory hierarchy:

                                  'PATH/logstash/TYPE/NAME.rb' where TYPE is

                                  'inputs' 'filters', 'outputs' or 'codecs'

                                  and NAME is the name of the plugin.

    -t, --configtest              Check configuration for valid syntax and then exit.

    --[no-]allow-unsafe-shutdown  Force logstash to exit during shutdown even

                                  if there are still inflight events in memory.

                                  By default, logstash will refuse to quit until all

                                  received events have been pushed to the outputs. (default: false)

    -r, --[no-]auto-reload        Monitor configuration changes and reload

                                  whenever it is changed.

                                  NOTE: use SIGHUP to manually reload the config

                                   (default: false)

    --reload-interval RELOAD_INTERVAL How frequently to poll the configuration location

                                  for changes, in seconds.

                                   (default: 3)

    --allow-env                   EXPERIMENTAL. Enables templating of environment variable

                                  values. Instances of "${VAR}" in strings will be replaced

                                  with the respective environment variable value named "VAR".

                                   (default: false)

-h, --help                    print help

Kafka 实例

[root@hadoop1 bin]# ./logstash -f flow-kafka.conf

Settings: Default pipeline workers: 1

Pipeline main started

[root@hadoop1 bin]# ./logstash -f kafka-es.conf

Settings: Default pipeline workers: 1

Pipeline main started

 

[root@hadoop1 ~]# kafka-console-consumer.sh --zookeeper hadoop1:2181 --from-beginning --topic folwKafka

 

1472703222.048 192.168.215.1         Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36  

1472703222.073 192.168.215.1         Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36  

 

 

配置文件实例

1-1)、输入实例

[root@hadoop1 bin]# ./logstash -e 'input{stdin{}}output{stdout{codec=>rubydebug}}'

Hello World

Settings: Default pipeline workers: 1

Pipeline main started

{

       "message" => "Hello World",

      "@version" => "1",

    "@timestamp" => "2016-10-07T14:33:43.530Z",

          "host" => "hadoop1"

}

 

1-2)、收集数据到Elasticsearch 

[root@hadoop1 bin]# cat flow-es.conf

input {

file{

type => "logstash_es"

        path => "/usr/local/nginx/logs/access.log"

        # 发现新数据的间隔

discover_interval => 1

        # 采集方式

start_position => "beginning"

}

}

 

output {

    elasticsearch {

index => "logstash-es-%{+YYYY.MM.dd}"

        hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]

    }

}

 

exclude    不想被监听的文件可以排除出去。

close_older    已经监听的文件,若超过这个时间内没有更新,就关闭监听该文件的句柄。默认为:3600s,即一小时。

ignore_older    在每次检查文件列表时,若文件的最后修改时间超过该值,则忽略该文件。默认为:86400s,即一天。

sincedb_path    定义 .sincedb 文件路径,默认为 $HOME/.sincedb 。

sincedb_write_interval    间隔多久写一次sincedb文件,默认15s。

stat_interval    每隔多久检查被监听文件状态(是否有更新),默认为1s。

start_position    logstash从什么位置开始读取文件数据。默认为结束位置,类似 tail -f 的形式。设置为“beginning”,则从头读取,类似 cat ,到最后一行以后变成为 tail -f 。

 

1-3)、收集数据到Kafka

[root@hadoop1 bin]# cat flow-kafka.conf

input {

  file {

    path => "/usr/local/nginx/logs/access.log"

    discover_interval => 5

    start_position => "beginning"

  }

}

 

output{

 kafka {

    topic_id => "logTest"

     codec => plain {

        format => "%{message}"

      }

    bootstrap_servers => "hadoop1:9092,hadoop2:9092,hadoop3:9092"

 }

}

 

1-4)、Kafka收集数据到Elasticsearch 

[root@hadoop1 bin]# cat kafka-es.conf

input {

  kafka {

    type => "kafka-es"

    auto_offset_reset => "smallest"

    codec => plain {

      charset => "GB2312"

    }

group_id => "es"

topic_id => "logTest"

zk_connect => "hadoop1:2181,hadoop2:2181,hadoop3:2181"

  }

}

 

filter {

    mutate {

     split => { "message" => "  " }

      add_field => {

       "ip_address" => "%{message[2]}"

        "liulanqi" => "%{message[3]}"

     }

     remove_field => [ "message" ]

  }

}

 

output {

    elasticsearch {

      index => "kafka-flow-es-%{+YYYY.MM.dd}"

  codec => plain {

        charset => "GB2312"

      }

      hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]

    }

}

 

1-5)、其他的配置

A)、KafkaToES

input {

  kafka {

    type => "accesslogs"

    codec => "plain"

    auto_offset_reset => "smallest"

    group_id => "elas1"

    topic_id => "accesslogs"

    zk_connect => "hadoop1:2181,hadoop2:2181,hadoop3:2181"

  }

 

  kafka {

    type => "gamelogs"

    auto_offset_reset => "smallest"

    codec => "plain"

    group_id => "elas2"

    topic_id => "gamelogs"

    zk_connect => "hadoop1:2181,hadoop2:2181,hadoop3:2181"

  }

}

 

filter {

  if [type] == "accesslogs" {

    json {

      source => "message"

  remove_field => [ "message" ]

  target => "access"

    }

  }

 

  if [type] == "gamelogs" {

    mutate {

      split => { "message" => " " }

      add_field => {

        "event_type" => "%{message[3]}"

        "current_map" => "%{message[4]}"

        "current_X" => "%{message[5]}"

        "current_y" => "%{message[6]}"

        "user" => "%{message[7]}"

        "item" => "%{message[8]}"

        "item_id" => "%{message[9]}"

        "current_time" => "%{message[12]}"

     }

     remove_field => [ "message" ]

   }

  }

}

 

output {

  if [type] == "accesslogs" {

    elasticsearch {

      index => "accesslogs"

  codec => "json"

      hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]

    }

  }

 

  if [type] == "gamelogs" {

    elasticsearch {

      index => "gamelogs"

      codec => plain {

        charset => "UTF-16BE"

      }

      hosts => ["hadoop1:9200", "hadoop2:9200", "hadoop3:9200"]

    }

  }

}

B)、参数说明

#整个配置文件分为三部分:input,filter,output

#参考这里的介绍 https://www.elastic.co/guide/en/logstash/current/configuration-file-structure.html

input {

  #file可以多次使用,也可以只写一个file而设置它的path属性配置多个文件实现多文件监控

  file {

    #type是给结果增加了一个属性叫type值为"<xxx>"的条目。这里的type,对应了ES中index中的type,即如果输入ES时,没有指定type,那么这里的type将作为ES中index的type。

    type => "apache-access"

    path => "/apphome/ptc/Windchill_10.0/Apache/logs/access_log*"

    #start_position可以设置为beginning或者end,beginning表示从头开始读取文件,end表示读取最新的,这个也要和ignore_older一起使用。

    start_position => beginning

    #sincedb_path表示文件读取进度的记录,每行表示一个文件,每行有两个数字,第一个表示文件的inode,第二个表示文件读取到的位置(byteoffset)。默认为$HOME/.sincedb*

    sincedb_path => "/opt/logstash-2.3.1/sincedb_path/access_progress"

    #ignore_older表示了针对多久的文件进行监控,默认一天,单位为秒,可以自己定制,比如默认只读取一天内被修改的文件。

    ignore_older => 604800

    #add_field增加属性。这里使用了${HOSTNAME},即本机的环境变量,如果要使用本机的环境变量,那么需要在启动命令上加--alow-env。

    add_field => {"log_hostname"=>"${HOSTNAME}"}

    #这个值默认是\n 换行符,如果设置为空"",那么后果是每个字符代表一个event

    delimiter => ""

    #这个表示关闭超过(默认)3600秒后追踪文件。这个对于multiline来说特别有用。... 这个参数和logstash对文件的读取方式有关,两种方式read tail,如果是read

    close_older => 3600

    coodec => multiline {

      pattern => "^\s"

      #这个negate是否定的意思,意思跟pattern相反,也就是不满足patter的意思。

#      negate => ""

      #what有两个值可选 previous和next,举例说明,java的异常从第二行以空格开始,这里就可以pattern匹配空格开始,what设置为previous意思是空格开头这行跟上一行属于同一event。另一个例子,有时候一条命令太长,当以\结尾时表示这行属于跟下一行属于同一event,这时需要使用negate=>true,what=>'next'。

      what => "previous"

      auto_flush_interval => 60

    }

  }

  file {

    type => "methodserver-log"

    path => "/apphome/ptc/Windchill_10.0/Windchill/logs/MethodServer-1604221021-32380.log"

    start_position => beginning

    sincedb_path => "/opt/logstash-2.3.1/sincedb_path/methodserver_process"

#    ignore_older => 604800

  }

}

filter{

  #执行ruby程序,下面例子是将日期转化为字符串赋予daytag

  ruby {

    code => "event['daytag'] = event.timestamp.time.localtime.strftime('%Y-%m-%d')"

  }

  # if [path] =~ "access" {} else if [path] =~ "methodserver" {} else if [path] =~ "servermanager" {} else {} 注意语句结构

  if [path] =~ "MethodServer" { #z这里的=~是匹配正则表达式

    grok {

      patterns_dir => ["/opt/logstash-2.3.1/patterns"] #自定义正则匹配

#      Tue 4/12/16 14:24:17: TP-Processor2: hirecode---->77LS

      match => { "message" => "%{DAY:log_weekday} %{DATE_US:log_date} %{TIME:log_time}: %{GREEDYDATA:log_data}"}

    }

    #mutage是做转换用的

    mutate {

      replace => { "type" => "apache" } #替换属性值

      convert => { #类型转换

        "bytes" => "integer" #例如还有float

        "duration" => "integer"

        "state" => "integer"

      }

    #date主要是用来处理文件内容中的日期的。内容中读取的是字符串,通过date将它转换为@timestamp。参考https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html#plugins-filters-date-match

#    date {

#      match => [ "logTime" , "dd/MMM/yyyy:HH:mm:ss Z" ]

#    }

  }else if [type] in ['tbg_qas','mbg_pre'] { # if ... else if ... else if ... else结构

  }else {

    drop{} # 将event丢弃

  }

}

output {

  stdout{ codec=>rubydebug} # 直接输出,调试用起来方便

  # 输出到redis

  redis {

    host => '10.120.20.208'

    data_type => 'list'

    key => '10.99.201.34:access_log_2016-04'

  }

  # 输出到ES

  elasticsearch {

    hosts =>"192.168.0.15:9200"

    index => "%{sysid}_%{type}"

    document_type => "%{daytag}"

  }

}

 

C)、配置实例

下面是日志的样子

55.3.244.1 GET /index.html 15824 0.043

 

正则的例子

%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

 

配置文件里是怎么写得?

 

input {

  file {

    path => “/var/log/http.log”

  }

}

filter {

  grok {

    match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ]

  }

}

 

解析后,是个什么样子?

 

client: 55.3.244.1

method: GET

request: /index.html

bytes: 15824

duration: 0.043

前台启动 

[root@hadoop1 bin]# ./logstash -f kafka-es.conf

 

后台启动

[root@hadoop1 bin]# ./logstash -f kafka-es.conf    /dev/null 2>&1 &

 

查看ES保存的数据

http://hadoop1:9200/_plugin/head/

 

 

猜你喜欢

转载自blog.csdn.net/xfg0218/article/details/82381717