Logstash: Deploy and scale Logstash

The Elastic Stack is used for a wide range of use cases, from operational log and metrics analysis to enterprise and application search. Ensuring that your data is scalable, durable, and securely transferred to Elasticsearch is important, especially for mission-critical environments.

The purpose of this document is to highlight the most common architectural patterns for Logstash and how to scale efficiently as your deployment grows. The focus will be around operational logging, metrics and security analytics use cases as they tend to require larger scale deployments. The deployment and scaling recommendations provided here may vary based on your own requirements.

let's start

For first-time users, if you just want to track log files to grasp the power of the Elastic Stack, we recommend you try Filebeat Modules . For the use of the Filebeat module, please read the article " Beats: Beats Getting Started Tutorial (2) " in detail. Filebeat modules enable you to quickly collect, parse, and index popular log types, and view pre-built Kibana dashboards in minutes. The Metricbeat module provides a similar experience, but with metric data. In this case, Beats will send data directly to Elasticsearch, where Ingest Nodes will process and index your data.

 

Introduction to Logstash

What are the key benefits of integrating Logstash into your architecture?

  • Scale by ingesting spikes - Logstash has an adaptive disk-based buffering system that absorbs incoming throughput, alleviating backpressure
  • Fetch from other data sources like databases, S3 or message queues
  • Send data to multiple destinations such as S3, HDFS, or write to a file
  • Use conditional dataflow logic to compose more complex processing pipelines

For a comparison of Beats and Logstash, please read the article " Beats: Introduction to Elastic Beats and Comparison with Logstash " in detail.

Ingest expansion

Beats and Logstash make ingest great. Together, they provide a scalable and resilient comprehensive solution. what can you expect

  • Horizontal scalability, high availability, and variable load handling
  • Message persistence with at-least-once delivery guarantee
  • End-to-end secure transmission with authentication and wire encryption 

Beats 和 Logstash

Running on thousands of edge host servers, Beats collects, tracks, and sends logs to Logstash. Logstash serves as a centralized streaming engine for data unification and enrichment. The Beats input plugin exposes a secure, confirmation-based endpoint for Beats to send data to Logstash.

Note : Enabling persistent queues is strongly recommended, and these architectural features assume they are enabled. We encourage you to review the Persistent Queue (PQ) documentation for feature benefits and more details on resiliency.

scalability

Logstash is horizontally scalable and can form groups of nodes running the same pipeline. Logstash's adaptive buffering facilitates smooth streaming even under variable throughput loads. If the Logstash layer becomes an ingestion bottleneck, simply add more nodes to scale out. Here are some general suggestions:

  • Beats should be load balanced across a set of Logstash nodes .
  • A minimum of two Logstash nodes is recommended for high availability.
  • It is common to deploy only one Beats input per Logstash node, but it is also possible to deploy multiple Beats inputs per Logstash node to expose independent endpoints for different data sources.

toughness

At-least-once delivery is guaranteed when using Filebeat or Winlogbeat for log collection in this ingestion stream . From Filebeat or Winlogbeat to Logstash, and from Logstash to Elasticsearch, both communication protocols are synchronous and support acknowledgments. Other Beats do not yet support this mechanism.

Logstash persistent queues provide protection across node failures. For disk-level resiliency in Logstash, it is important to ensure disk redundancy. For local deployments, it is recommended that you configure RAID. When running in the cloud or in a containerized environment, it is recommended that you use persistent disks with a replication policy that reflects your data SLA.

Note : Make sure queue.checkpoint.writes: 1 is set to at-least-once guarantee. See the Persistent Queue Persistence documentation for more details.

data processing

Logstash usually uses grok or dissect to extract fields, add geographic information , and can use files , databases , or Elasticsearch to find datasets to further enrich events. For more filters on enriching data, please refer to " Logstash: Enriching Data with Lookups ". Note that processing complexity affects overall throughput and CPU utilization. Make sure to check out the other available filter plugins.

secure transmission

Provide enterprise-grade security throughout the delivery chain.

  • Wire encryption is recommended for transfers from Beats to Logstash and from Logstash to Elasticsearch .
  • There are many security options when communicating with Elasticsearch, including basic authentication, TLS, PKI, LDAP, AD, and other custom realms. To enable Elasticsearch security, see Securing a Cluster .

monitor

When running Logstash 5.2 or later, the monitoring UI provides deep visibility into deployment metrics, helping to observe performance and alleviate bottlenecks when scaling. Monitoring is an X-Pack feature under the base license, so it is free to use. To get started, see Monitoring Logstash .

If external monitoring is preferred, a monitoring API that returns point-in-time snapshots of metrics is available .

Add other popular sources

Users may have other mechanisms for collecting log data, and it is easy to integrate and centralize them into the Elastic Stack. Let's look at a few scenarios:

 

TCP, UDP and HTTP protocols

TCP, UDP, and HTTP protocols are common methods of feeding data into Logstash. Logstash can expose endpoint listeners with corresponding TCP, UDP, and HTTP input plugins. The data sources listed below are typically obtained through one of these three protocols.

Note : The TCP and UDP protocols do not support application-level acknowledgments, so connection problems may result in data loss.

For high availability scenarios, a third-party hardware or software load balancer, such as HAProxy, should be added to fan out traffic to a set of Logstash nodes.

Network and Security Data

While Beats may already meet your data ingestion use case, network and security datasets come in many forms. Let's talk about some other ingestion points.

  • Network Line Data - Collect and analyze network traffic with Packetbeat.
  • Netflow v5/v9/v10 - Logstash uses Netflow codecs to understand data from Netflow/IPFIX exporters.
  • Nmap - Logstash accepts and parses Nmap XML data using the Nmap codec.
  • SNMP trap - Logstash has a native SNMP trap input.
  • CEF - Logstash accepts and parses CEF data from systems such as Arcsight SmartConnectors using the CEF codec. See this blog series for more details .

Infrastructure and ApplicationsData and IoT

Infrastructure and application metrics can be collected using Metricbeat, but applications can also send webhooks to the Logstash HTTP input or poll for metrics from HTTP endpoints using the HTTP poller input plugin.

For applications logging with log4j2, it is recommended to use SocketAppender to send JSON to Logstash TCP input. Alternatively, log4j2 can also log to a file for collection using FIlebeat. The log4j1 SocketAppender is deprecated.

IoT devices such as Raspberry Pis, smartphones, and connected vehicles often send telemetry data through one of these protocols.

Integrate with message queues

If you're leveraging message queuing technology as part of your existing infrastructure, it's easy to get that data into the Elastic Stack. For existing users who use an external queuing layer such as Redis or RabbitMQ just for Logstash data buffering, it is recommended to use Logstash persistent queues instead of the external queuing layer. This will help simplify management overall by removing unnecessary layers of complexity from the ingestion architecture.

For users who want to integrate data from existing Kafka deployments or require underlying usage of ephemeral storage, Kafka can act as a data hub where Beats can persist and Logstash nodes can consume data from.

 Other TCP, UDP, and HTTP sources can use Logstash as a pipeline to persist to Kafka in place of a load balancer for high availability. A group of Logstash nodes can then consume from the topic using the Kafka input to further transform and enrich the data in transit.

resilience and recovery

When Logstash is consuming from Kafka, durable queues should be enabled and will increase transport resiliency to mitigate the need for reprocessing during Logstash node failures. In this case, it is recommended to use the default persistent queue disk allocation size queue.max_bytes: 1GB.

If Kafka is configured to retain data for a long time, data can be reprocessed from Kafka in case of disaster recovery and reconciliation.

Other message queue integrations

While no additional queuing layer is required, Logstash can use countless other message queuing technologies such as RabbitMQ and Redis . It also supports ingestion of data from managed queue services such as Pub/Sub , Kinesis , and SQS .

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/130057955