Table of contents
Introduction to Logstash
introduce
Logstash is an open source server-side data processing pipeline capable of ingesting data from multiple sources simultaneously, transforming it, and sending it to your favorite repository (ours is of course ElasticSearch)
Let's go back to our ElasticStack architecture diagram, and we can see that Logstash serves as the data processing requirement. When our data needs to be processed, it will be sent to Logstash for processing, otherwise it will be sent directly to ElasticSearch
use
Logstash can process a variety of inputs, from documents, graphs, and databases, and then send them to Elasticsearch after processing.
Deployment and installation
Logstash mainly processes the data of the data source line by line, and also directly filters and cuts functions.
First go to the official website to download logstash: portal
Choose the version we need to download:
Download directly using wget
#检查jdk环境,要求jdk1.8+
java -version
# 下载
wget https://artifacts.elastic.co/downloads/logstash/logstash-8.8.1-linux-x86_64.tar.gz
#解压安装包
tar -xvf logstash-8.8.1-linux-x86_64.tar.gz
mv logstash-8.8.1 logstash
#第一个logstash示例--定义标准输入和输出
bin/logstash -e 'input { stdin { } } output { stdout {} }'
test
We enter hello in the console, and we can see its output immediately
Configuration details
Logstash configuration has three parts, as shown below
input {
#输入
stdin {
... } #标准输入
}
filter {
#过滤,对数据进行分割、截取等处理
...
}
output {
#输出
stdout {
... } #标准输出
}
enter
- Collect data of various shapes, sizes and sources, often in various forms, distributed or centralized in many systems.
- Logstash supports a variety of input choices to capture events from many common sources at the same time. Easily ingest data from your logs, metrics, web applications, data stores, and various AWS services in a continuous stream.
filter
- Parse and transform data in real time
- As data travels from source to repository, Logstash filters parse individual events, identify named fields to build structures, and transform them into a common format for easier and faster analysis and business value.
output
Logstash offers numerous output options to get data where it needs to go, with the flexibility to unlock numerous downstream use cases.
Read custom log
Earlier, we read the nginx log through Filebeat. If it is a log with a custom structure, it needs to be read and processed before it can be used. Therefore, Logstash needs to be used at this time, because Logstash has powerful processing capabilities and can handle various Various scenarios.
Log structure
2023-06-17 21:21:21|ERROR|1 读取数据出错|参数:id=1002
As you can see, the content in the log is split using "|". Using this, we also need to split the data when processing.
Write configuration file
vim shengxia-pipeline.conf
Then add the following content
input {
file {
path => "/opt/elk/logs/app.log"
start_position => "beginning"
}
}
filter {
mutate {
split => {
"message"=>"|"}
}
}
output {
stdout {
codec => rubydebug }
}
start up
#启动
./bin/logstash -f ./mogublog-pipeline.conf
Then we insert our test data
echo "2023-06-17 21:21:21|ERROR|读取数据出错|参数:id=1002" >> app.log
Then we can see that logstash will capture the data we just inserted, and our data will also be split.
Output to Elasticsearch
We can modify our configuration file to output our logging records to ElasticSearch
input {
file {
path => "/opt/elk/logs/app.log"
start_position => "beginning"
}
}
filter {
mutate {
split => {
"message"=>"|"}
}
}
output {
elasticsearch {
hosts => ["192.168.40.150:9200","192.168.40.137:9200","192.168.40.138:9200"]
}
}
Then restart our logstash
./bin/logstash -f ./shenngxia-pipeline.conf
Then insert two pieces of data into the log record
echo "2023-06-17 21:57:21|ERROR|读取数据出错|参数:id=1002" >> app.log
echo "2023-06-17 21:58:21|ERROR|读取数据出错|参数:id=1003" >> app.log
Finally, you can see the data we just inserted.