Understand ElasticSearch again

  • Overview

    A module in Filebeat is a way to parse a specific log file format for a particular software.

  • Pipeline

    A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared.

    A pipeline consists of two main fields:

    • a description :

      The description is a special field to store a helpful description of what the pipeline does

    • a list of processors

      The processors parameter defines a list of processors to be executed in order

    {
      "description"	:	"...",
      "processors"	:	[...]
    }
    
  • Painless scripting language

    Painless Guide

    Painless Language Specification

    Understand ANTLR4 & ASM

    理解inline script || stored script

    Painless is a simple, secure scripting language designed specifically for use with Elasticsearch.

    It is the default scripting language for Elasticsearch and can safely be used for inline and stored scripts.

  • Mapping

    Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

    A mapping definition has:

    • Metadata fields

      Metadata fields are used to customize how a document’s asspciated metadata is treated.

      Examples of metadata fields include the document’s _index, _id, and _source fields.

    • Fields

      A mapping contains a list of fields or properties pertinent to the document.

      Each field has its own data type.

    Defining too many fields in an index can lead to a mapping explosion, which can cause out of memory errors and difficult situations to recover from.

    There are two way to implement mapping:

    • Dynamic mapping

      One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible.

      To index a document, you don’t have to first create an index, define a mapping type, and define your fields - you can just index a document and the index, type , and fields will spring to life automatically.

      The automatic detection and addition of new fields is called dynamic mapping

    • Explicit mapping

  • data replication model

    Each index in Elasticsearch is divided into shards and each shard can have multiple copies.

    These copies are known as a replication group and must be kept in sync when documents are added or removed.

    The process of keeping the shard copies in sync and serving reads from them is what we call the data replication model.

    This model is based on having a single copy from the replication group that acts as the primary shard, the other copies are called replica shards.

  • Ingest node

    The built-in modules are almost entirely using the Ingest node feature of Elasticsearch instead of the Beats processors.

    One of the most helpful parts of the ingest pipeline is the ability to debug by using the Simulate Pipeline API.

    The simulate pipeline API executes a specific pipeline against a set of documents provided in the body of the request.

    You can either specify an existing pipeline to execute against the provided documents or supply a pipeline definition in the body of the request.

    You can use the simulate pipeline API to see how each processor affects the ingest document as it passes through the pipeline. To see the intermediate results of each processor in the simulate request, you can add the verbose parameter to the request.

    The Ingest pipeline works on a document level, you still need to check for exceptions where the logs are generated and let Filebeat create a single message out of that.

  • Suricata fields

    理解suricata.eve.timestamp

  • References

  1. Monitor Java App

Guess you like

Origin blog.csdn.net/The_Time_Runner/article/details/113001917