logstash是一个数据采集、加工处理以及传输的工具,目前流行搭配
• logstash 特点:
– 所有类型的数据集中处理– 不同模式和格式数据的正常化
– 自定义日志格式的迅速扩展
– 为自定义数据源轻松添加插件
https://www.elastic.co/guide/en/logstash/current/input-plugins.html //帮助文档地址
logstash 工作结构
– { 数据源 } ==>input { } ==>filter { } ==>output { } ==>{ ES }
• logstash 数据类型
– 布尔值类型: ssl_enable => true– 字节类型:bytes => "1MiB"
– 字符串类型: name => "xkops"
– 数值类型: port => 22
– 数组: match => ["datetime","UNIX"]
– 哈希: options => {k => "v",k2 => "v2"}
– 编码解码: codec => "json"
– 路径: file_path => "/tmp/filename"
– 注释: #
• logstash 条件判断
– 等于: ==– 不等于: !=
– 小于: <
– 大于: >
– 小于等于: <=
– 大于等于: >=
– 匹配正则: =~
– 不匹配正则: !~
– 包含: in
– 不包含: not in
– 且: and
– 否: or
– 非且: nand
– 非或: xor
– 复合表达式: ()
– 取反符合:!()
• logstash 插件(codec类、input类、filter类、output类)
– codec是通用类插件,input(数据来源)、filter(数据过滤)、output(数据输出)
– 对不熟悉的插件,可以网上看帮助文档
logstash-codec-collectd
logstash-codec-dots
logstash-codec-edn
logstash-codec-edn_lines
logstash-codec-es_bulk
logstash-codec-fluent
logstash-codec-graphite
logstash-codec-json
logstash-codec-json_lines
logstash-codec-line
logstash-codec-msgpack
logstash-codec-multiline
logstash-codec-netflow
logstash-codec-oldlogstashjson
logstash-codec-plain
logstash-codec-rubydebug
logstash-filter-anonymizelogstash-filter-checksum
logstash-filter-clone
logstash-filter-csv
logstash-filter-date
logstash-filter-dns
logstash-filter-drop
logstash-filter-fingerprint
logstash-filter-geoip
logstash-filter-grok
logstash-filter-json
logstash-filter-kv
logstash-filter-metrics
logstash-filter-multiline
logstash-filter-mutate
logstash-filter-ruby
logstash-filter-sleep
logstash-filter-split
logstash-filter-syslog_pri
logstash-filter-throttle
logstash-filter-urldecode
logstash-filter-useragent
logstash-filter-uuid
logstash-filter-xml
logstash-input-beats
logstash-input-couchdb_changes
logstash-input-elasticsearch
logstash-input-eventlog
logstash-input-exec
logstash-input-file
logstash-input-ganglia
logstash-input-gelf
logstash-input-generator
logstash-input-graphite
logstash-input-heartbeat
logstash-input-http
logstash-input-http_poller
logstash-input-imap
logstash-input-irc
logstash-input-jdbc
logstash-input-kafka
logstash-input-log4j
logstash-input-lumberjack
logstash-input-pipe
logstash-input-rabbitmq
logstash-input-redis
logstash-input-s3
logstash-input-snmptrap
logstash-input-sqs
logstash-input-stdin
logstash-input-syslog
logstash-input-tcp
logstash-input-twitter
logstash-input-udp
logstash-input-unix
logstash-input-xmpp
logstash-input-zeromq
logstash-output-cloudwatch
logstash-output-cloudwatch
logstash-output-csv
logstash-output-elasticsearch
logstash-output-email
logstash-output-exec
logstash-output-file
logstash-output-ganglia
logstash-output-gelf
logstash-output-graphite
logstash-output-hipchat
logstash-output-http
logstash-output-irc
logstash-output-juggernaut
logstash-output-kafka
logstash-output-lumberjack
logstash-output-nagios
logstash-output-nagios_nsca
logstash-output-null
logstash-output-opentsdb
logstash-output-pagerduty
logstash-output-pipe
logstash-output-rabbitmq
logstash-output-redis
logstash-output-s3
logstash-output-sns
logstash-output-sqs
logstash-output-statsd
logstash-output-stdout
logstash-output-tcp
logstash-output-udp
logstash-output-xmpp
logstash-output-zeromq
logstash-patterns-core // 这个是过滤匹配块
• logstash 安装
– Logstash 依赖 java 环境,需要安装 java-1.8.0-openjdk
– Logstash 没有默认的配置文件,需要手劢配置
– logstash 安装在 /opt/logstash 目录下
[root@logstash ~]# yum -y install logstash-2.3.4-1.noarch.rpm //可以网上下载更高的版本
• logstash 的第一个配置文件
[root@logstash ~]# vim /etc/logstash/logstash.conf
input{
stdin{}
}
filter{ }
output{
stdout{}
}
• 测试
[root@logstash ~]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf
Settings: Default pipeline workers: 2
Pipeline main started
test
test
//可以随便输入什么,就会自动输出相同内容
— codec 类插件
input{
stdin{ codec => "json" }
}
filter{ }
output{
stdout{ codec => "rubydebug" }
}
– 我们输入普通数据和 json 对比
– {"a": 11, "c": 13, "b": 12}
— input file 插件
input {
start_position => "beginning"
sincedb_path => "/var/lib/logstash/sincedb-access"
path => [“/tmp/alog”, “/tmp/blog”]
type => 'filelog'
}
}
– sincedb_path 记录读取文件的位置– start_position 配置第一次读取文件从什么地方开始
— input tcp 和 udp 插件
input {
tcp{
host => "0.0.0.0"port => 8888
type => "tcplog"
}
udp{
host => "192.168.4.16"
port => 9999
type => "udplog"
}
}
— syslog 插件
input {
syslog{host => "192.168.4.10"
port => 514
type => "syslog"
}
}
– rsyslog.conf 配置向进程发送数据 //要传送log的主机上
//rsyslog.conf @@192.168.4.10:514(TCP模式传送)@192.168.4.10:514(UDP模式传送)
//要输出哪个,直接复制默认例如local0.info(名字),再写上传送地址就可以了
local0.info @@192.168.4.10:514– 写 syslog ,查看状态
logger -p local0.info -t test_logstash 'test message'[root@web2 bin]# cat /etc/logstash/conf.d/logstash.conf
input {
file{
path => ["/tmp/a.log","/tmp/b.log"]
sincedb_path => "/var/lib/logstash/since.db"
start_position => "beginning"
type => "filelog"
}
}
filter { }
output {
stdout{ codec => "rubydebug" }
}
— filter grok插件
– 解析各种非结构化的日志数据插件– grok 使用正则表达式把飞结构化的数据结构化
– 在分组匹配,正则表达式需要根据具体数据结构编写
– 虽然编写困难,但适用性极广
– 几乎可以应用于各类数据
filter {
grok{
match => [“message”,“%{IP:ip}, (?<key>reg)”]}
}
– grok 正则分组匹配
匹配 ip 时间戳 和 请求方法
"(?<ip>(\d+\.){3}\d+) \S+ \S+(?<time>.*\])\s+\"(?<method>[A-Z]+)"]使用正则宏
%{IPORHOST:clientip} %{HTTPDUSER:ident} %{USER:auth}\[%{HTTPDATE:timestamp}\] \"%{WORD:verb}
最终版本
%{COMMONAPACHELOG} \"(?<referer>[^\"]+)\"\"(?<UA>[^\"]+)\"
– output ES 插件
output {
if [type] == "filelog"{elasticsearch {
hosts => ["192.168.4.15:9200"]
index => "weblog"
flush_size => 2000
idle_flush_time => 10
}
}
}
//调试成功后,把数据写入 ES 集群
— input filebeats 插件
//要配合发送日志的主机上安装filebeats 插件使用,接受装有filebeats插件发过来的日志数据
//logstash服务器上设置
input {
beats {
port => 5044}
}
– 这个插件主要用来接收 beats 类软件发送过来的数据,由于 logstash 依赖 java 环境,而且占用资源非常大,
我们往往不希望所有集群的机器都部署 java 环境安装
logstash,而使用更轻量的 filebeat 替代
— filebeat 数据发送端
//安装到要发送日志数据的机子上
– 使用 rpm 安装 filebeat# yum -y install filebeat-1.2.3-x86_64.rpm
– 修改配置文件 /etc/filebeat/filebeat.yml
document_type: weblog
..
paths:
- /var/log/httpd/access_log //可以多目录,再下一行直接写就行如:- /var/log/httpd/data.log...
input_type: apache_log
..
document_type: apachelog
..
output:
..
#elasticsearch: //记得注释这一行
..
#hosts: ["localhost:9200"] //记得注释这一行
...
logstash:
..
hosts: ["192.168.4.10:5044"]
..
设置开机运行# systemctl enable filebeat
开启服务
# systemctl start filebeat
//重启服务后,客户机部署就完成了
//浏览这台机的web服务器后(多刷新几下,稍等几秒)
//会有记录写入access_log这个日志,然后就会自动发送给logstash主机上
-----查看logstash服务主机
[root@web2 ~]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf
Settings: Default pipeline workers: 2
Pipeline main started
{
"message" => "192.168.1.254 - - [06/Jul/2018:20:10:31 +0800] \"GET / HTTP/1.1\" 304 - \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36\"",
"@version" => "1",
"@timestamp" => "2018-07-06T12:10:40.727Z",
"input_type" => "log",
"count" => 1,
//已经接收到日志数据了
------综合应用ELK
1)logstash服务器配置文件(IP:192.168.1.134)
[root@logstash bin]# cat /etc/logstash/conf.d/logstash.conf
input {
tcp {
port => 8888
type => "test"
}
beats {
port => 5044
}
}
filter {
if [type] == "apachelog" {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
}
}
}
output {
stdout {
codec => "rubydebug"
}
if [type] == "apachelog" {
elasticsearch {
hosts => ["192.168.1.200:9200","192.168.1.203:9200"]
index => "apachelog"
flush_size => 2000
idle_flush_time => 10
}
}
}
2)Elasticsearch服务搭建好后基本不用什么配置(IP:http://192.168.1.200:9200/_plugin/head/)
3)Kibana连接到ES数据库后做好视图和面板就可以了。
//到这里实现的效果是,实时监测web日志并实时图形化分类展示