Logstash input Kafka output Es configuration

Introduction to Logstash

Logstash is an open source data collection engine with real-time pipeline capabilities. It dynamically unifies and standardizes data from a variety of data sources and sends it to the destination of your choice. Logstash's early goals were primarily for collecting logs, but its functionality now extends far beyond that. Any event type can be analyzed through Logstash and transformed through input, filter and output plugins.

The working principle of Logstash is to use pipelines to collect, process and output logs. This pipeline consists of three stages: input, processing, and output. Input plugins consume data from the source, filter plugins modify the data according to your expectations, and output plugins write the data to the destination.

Logstash's input supports a variety of options to capture events from many common sources at the same time, such as logs, metrics, web applications, data stores, and various AWS services. As data travels from source to repository, Logstash's filters are able to parse individual events, identify named fields to build structures, and transform them into a common format for easier, faster analysis and realization of business value. .

Logstash output can also choose different storage methods according to needs. In addition to Elasticsearch as the preferred output direction, there are other output options.

Logstash is a powerful open source tool that can be used to process and transform data from various data sources in real time to provide support for data analysis and business decision-making.

Introduction to Kafka

Kafka is an open source stream processing platform developed by the Apache Software Foundation and written in Scala and Java. It is a high-throughput distributed publish-subscribe messaging system that can handle all action flow data of consumers in the website. This data is typically addressed by processing logs and log aggregation due to throughput requirements. For log data and offline analysis systems like Hadoop, but requiring real-time processing constraints, Kafka is a feasible solution.

The purpose of Kafka is to unify online and offline message processing through Hadoop's parallel loading mechanism, and to provide real-time messages through the cluster.

Es Introduction

ES refers to Elasticsearch, which is an open source distributed search engine based on RESTful web interface and built on Apache Lucene. It is also a distributed document database where every field is indexable and the data for each field is searchable. It can scale horizontally to hundreds of server storage and process petabytes of data, and can store, search and analyze large amounts of data in a very short time. Usually used as the core engine in situations with complex search scenarios.

Logstash input and output configuration

Logstash's input and output configuration is mainly set for its input and output plug-ins. Here are some common input and output plugin configuration examples:

Enter configuration:

  1. file: Read log information from a file, for example:
input {
    
    
  file {
    
    
    path => "/var/log/error.log"
    type => "error"
    start_position => "beginning"
  }
}
  1. stdin: Read log information from standard input, for example:
input {
    
    
  stdin {
    
    }
}
  1. syslog: Read log information from the system log, for example:
input {
    
    
  syslog {
    
    
    type => "syslog"
  }
}

Output configuration:

  1. stdout: Output log information to standard output, for example:
output {
    
    
  stdout {
    
    }
}
  1. elasticsearch: Output log information to the Elasticsearch cluster, for example:
output {
    
    
  elasticsearch {
    
    
    hosts => "localhost:9200"
    index => "myindex"
  }
}

The above are some common input and output plug-in configuration examples. Logstash also supports a variety of other input and output plug-ins, which can be selected and configured according to specific needs.

Logstash input Kafka output Es configuration

The input configuration of Logstash can read data from Kafka through the Kafka plug-in, and the output configuration can write data to the Elasticsearch cluster through the Elasticsearch plug-in. Here is an example configuration:

input {
    
    
  kafka {
    
    
    bootstrap_servers => "your_kafka_server:9092"
    client_id => "your_client_id"
    group_id => "your_group_id"
    auto_offset_reset => "latest"
    consumer_threads => 1
    decorate_events => true
    topics => ["your_topic"]
  }
}

output {
    
    
  if [@metadata][kafka][topic] == "your_topic" {
    
    
    elasticsearch {
    
    
      hosts => "your_elasticsearch_server:9200"
      index => "your_index"
      timeout => 300
    }
  }
}

In this configuration, Logstash reads data from the specified Kafka server and topic through the Kafka plugin, and then writes the data to the specified Elasticsearch index through the Elasticsearch plugin. You can modify the parameters in the configuration according to the actual situation, such as the address of the Kafka server, client ID, group ID, topic, etc.

  • The meaning of the above configuration parameters is as follows:
  1. bootstrap_servers: This is the address and port of the Kafka server. You need to provide the address of at least one server in the Kafka cluster.
  2. client_id: This is the client's unique identifier and is used to identify the client connected to the Kafka cluster.
  3. group_id: This is the ID of the consumer group. If you have multiple Logstash instances reading from the same Kafka topic and you want to process them as a consumer group, then you need to use this parameter.
  4. auto_offset_reset: This parameter determines what Logstash should do when it cannot find the offset it previously read. Setting to "latest" means reading starts from the latest record.
  5. consumer_threads: This is the number of threads used to consume Kafka messages. Increasing the number of threads can speed up data reading, but will also increase CPU and memory usage.
  6. decorate_events: If set to true, Logstash will add additional metadata to each event, such as Kafka topic and partition information.
  7. topics: This is the list of Kafka topics to be read by Logstash.
  8. if [@metadata][kafka][topic] == "your_topic": This is a conditional statement that determines whether to send the event to Elasticsearch. Events will be sent to Elasticsearch only if their topic matches the specified "your_topic".
  9. hosts: This is the address and port of the Elasticsearch cluster.
  10. index: This is the index name where Logstash writes data to Elasticsearch.
  11. timeout: This is the timeout (in seconds) for Logstash to communicate with the Elasticsearch cluster.

These parameters can be adjusted based on your specific needs to meet your data collection and processing needs.

java send message to Kafka example

Apache Kafka is a distributed stream processing platform that you can use to process all kinds of data. The following is sample code for sending messages to Kafka using Java:

First, you need to add the Apache Kafka dependency to your project. If you are using Maven, then you can add the following dependencies in the pom.xml file:

<dependencies>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>2.8.0</version>
    </dependency>
</dependencies>

The following is sample code for sending a message using Java:

import org.apache.kafka.clients.producer.*;

import java.util.Properties;

public class ProducerDemo {
    
    
    public static void main(String[] args) {
    
    
        // 1. 配置生产者客户端参数
        Properties props = new Properties();
        // Kafka集群地址
        props.put("bootstrap.servers", "your_kafka_server:9092");
        // 消息ack模式: all表示消息被leader和follower都写入后才返回ack, -1表示只被leader写入就返回ack
        props.put("acks", "all");
        // 重试次数
        props.put("retries", 0);
        // 批量发送大小
        props.put("batch.size", 16384);
        // 发送延时,用于控制producer发送请求的延迟时间,可以提高吞吐量
        props.put("linger.ms", 1);
        // 缓冲区大小
        props.put("buffer.memory", 33554432);
        // key序列化类
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        // value序列化类
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        // 2. 创建生产者对象,传入配置参数
        Producer<String, String> producer = new KafkaProducer<>(props);
        for (int i = 0; i < 100; i++) {
    
    
            // 3. 创建消息对象,指定topic、消息key和消息体value
            ProducerRecord<String, String> record = new ProducerRecord<>("your_topic", "key" + i, "value" + i);
            // 4. 发送消息到Kafka集群,并获取返回结果
            RecordMetadata metadata = producer.send(record).get();
            // 打印结果,发送是否成功,以及发送到的分区和offset等信息
            System.out.printf("offset = %d, partition = %d%n", metadata.offset(), metadata.partition());
        }
        // 5. 关闭生产者对象,释放资源
        producer.close();
    }
}

In this example, we create a class namedProducerDemo that uses Kafka's producer API to send messages to a topic named "my-topic". Please note that you need to replace the value of the "bootstrap.servers" attribute with the actual address of your Kafka cluster. If your cluster is running locally and using the default port, you can use "localhost:9092".

Commonly used input plug-ins for Logstash

Commonly used input plug-ins for Logstash include the following:

  1. file: This plugin can read events from a file. It uses the FileWatch library to monitor file changes and track the current reading position of the monitored log file to ensure that no data is missed.
  2. stdin: This plug-in is a standard input plug-in that can read events from the command line.
  3. TCP: Read data from the TCP connection.
  4. UDP: Read data from the UDP socket.
  5. Redis: Read data from Redis.
  6. JDBC: Read data from a relational database.
  7. HTTP: Read data from the HTTP server.

Commonly used output plug-ins for Logstash

Commonly used output plug-ins in Logstash include the following:

  1. Elasticsearch: Output log data to Elasticsearch for subsequent search and analysis.
  2. Kafka: Send log data to the Kafka cluster for use by other consumers.
  3. File: Output the log data to a file for subsequent viewing and auditing.
  4. Gelf: Output log data to a Gelf-compatible server for remote monitoring and alarming.
  5. Fluentd: Output log data to Fluentd for unified log collection and forwarding.

expand

Logstash usage guide

Kafka usage guide

Elasticsearch usage guide

Insert image description here

Guess you like

Origin blog.csdn.net/zhangzehai2234/article/details/134960959