Introduction, deployment and use of Logstash

Combined with the use of ELK as a log management tool in the project, the relevant content of Elasticsearch has been integrated and introduced earlier. In the project, two Logstash nodes are deployed to consume messages from the Kafka cluster and output to the Elasticsearch cluster for log data storage. This article combines project practice and official website and other networks Information, continue to integrate the related introduction, working process, installation and operation, and related configuration of Logstash in the ELK technology stack. There are inevitable omissions in the article. I hope readers will comment and discuss the deficiencies. We will learn together and make continuous progress. Thank you very much!

1. Introduction to Logstash

Logstash is an open source data collection engine with real-time pipeline processing capabilities. In the ELK Stack, its main function is to collect file data in the log, format and normalize the log data internally, and then send it to the designated receiver, usually Elasticsearch. Logstash mainly relies on rich internal plug-ins to process the collected data.

Advantages of Logstash

1.可伸缩性
Beats应该在一组Logstash节点之间进行负载平衡,建议至少使用两个Logstash节点以实现高可用性。
每个Logstash节点只部署一个Beats输入是很常见的,但每个Logstash节点也可以部署多个Beats输入,以便为不同的数据源公开独立的端点。
2.弹性
Logstash持久队列提供跨节点故障的保护。对于Logstash中的磁盘级弹性,确保磁盘冗余非常重要。
对于内部部署,建议配置RAID。在云或容器化环境中运行时,建议使用具有反映数据SLA的复制策略的永久磁盘。
3.可过滤
对事件字段执行常规转换。可以重命名,删除,替换和修改事件中的字段。
可扩展插件生态系统,提供超过200个插件,以及创建和贡献自己的灵活性。

Logstash Disadvantages

Logstash耗资源较大,运行占用CPU和内存高。另外没有消息队列缓存,存在数据丢失隐患。

2. Logstash working process

insert image description here

The Logstash event processing pipeline has three stages: input, filter, output. Inputs generate events, filters modify them, and outputs ship them elsewhere. Input and output support codecs (codecs), allowing you to encode or decode data as it enters or exits the pipeline without using separate filters. These four parts exist in the form of plug-ins. Users define the pipeline configuration file and set the input, filter, output, and codec plug-ins to be used to achieve specific functions such as data collection, data processing, and data output.

Inputs:用于从数据源获取数据,常见的插件如file, syslog, redis, beats 等
Filters:用于处理数据如格式转换,数据派生等,常见的插件如grok, mutate, drop, clone, geoip等
Outputs:用于数据输出,常见的插件如elastcisearch,file, graphite, statsd等
Codecs:Codecs(编码插件)不是一个单独的流程,而是在输入和输出等插件中用于数据转换的模块,
				用于对数据进行编码处理,常见的插件如json,multiline。
Logstash不只是一个input | filter | output 的数据流,而是一个 input | decode | filter | encode | output 的数据流!
			codec 就是用来 decode、encode 事件的。

3. Install and run Logstash

  1. Download the installation package

     https://artifacts.elastic.co/downloads/logstash/logstash-6.2.2.tar.gz
    
  2. decompress

     tar -zvxf logstash-6.2.2.tar.gz
    
  3. start running

     ./logstash -f  input_out.conf
      input_out.conf为配置文件,其中可指定数据源input、filter过滤以及数据导出方向output等;
    

4. Logstash configuration

Configuration file example: standard input and output, file input to standard console output, standard console input to Elasticsearch output, kafka input to Elasticsearch output, etc.

Standard console input and output configuration

input {
    stdin{
    }
}
output {
    stdout{
    }
}

file input to standard console output

input{
    file{
        path =>"/home/u-0/logstash/logstash-6.2.2/logs/logstash-plain.log"
        start_position=>"beginning"
    }
}

filter{
    grok{
        match=>{
            "message"=>"%{DATA:clientIp} "
        }

        remove_field=>"message"
    }
    date{
        match=>["accessTime","dd/MMM/yyyy:HH:mm:ss Z"]
    }
}

output{
    stdout{
        codec=>rubydebug
    }
}

Grok is the most important plugin for Logstash. Its main function is to convert strings in text format into specific structured data and use them with regular expressions.

input/file/path:指定扫描的文件。扫描多个文件可以使用*路径通配符;
				或者以数组形式提供(path=>[“outer-access.log”,”access.log”]);
				或者直接给定目录,logstash会扫描路径所有的文件,并监听是否有新文件。

filter/grok/match/message:里面的DATA是grok语法内置的正则表达式,DATA匹配任意字符.

filter/grok/match/date:是对HTTPDATE日期格式的解释,joda可以支持很多复杂的日期格式,
						需要在这里指明才能正确匹配。

remove_field=>”message”:用处是去掉原有的整个日志字符串,仅保留filter解析后的信息。
						可以试着去掉这一句就可明白用处。

Standard console input Elasticsearch output

input{
        stdin {}
}
output {
        elasticsearch {
                hosts => ["192.168.65.146:9201","192.168.65.148:9202","192.168.65.149:9203"]
                index => "test_index"
                user => "elastic"
    			password => "pwd123"
        }
        stdout { codec => rubydebug}
}

kafka input Elasticsearch output

input{
        kafka {
        		type => "log"
        		bootstrap_servers => ["http://192.168.65.146:9092","http://192.168.65.148:9092","http://192.168.65.149:9092"]
        		group_id => "log_consumer_group"
        		auto_offset_reset => "earliest"
        		consumer_threads => 1
        		decorate_events => true
        		topics => ["log1","log2"]
        		codec => "json"
        }
}
filter {
	mutate {
			lowercase => ["service"]
	}
}

output {
        elasticsearch {
                hosts => ["192.168.65.146:9201","192.168.65.148:9202","192.168.65.149:9203"]
                index => "test_index"
                user => "elastic"
    			password => "pwd123"
        }
        stdout { codec => rubydebug}
}

5. References

[1]. Logstash introduction .
[2]. Logstash introduction and architecture .
[3]. Logstash detailed records .
[4]. Logstash best practice .
[4]. Logstash performance tuning under ELK .

Guess you like

Origin blog.csdn.net/shy871/article/details/118309754