Elasticsearch: redact processor - edit processor

Warning : This feature is in technology preview and may change or be removed in future releases. Elastic will do its best to resolve any issues, but features in tech preview releases are not subject to the official GA feature support SLA.

Redact processors use the Grok rules engine to fuzz text in an input document that matches a given Grok pattern. The processor can be used to hide personally identifiable information (PII) by configuring it to detect known patterns such as email or IP addresses. Text matching a Grok pattern will be replaced with a configurable string, such as <EMAIL> to match an email address, or if you prefer, just replace all matches with the text <REDACTED>.

Elasticsearch comes with a number of useful predefined schemas that can be easily referenced by Redact processors. If one of them doesn't meet your needs, create a new schema with a custom schema definition. The Redact processor replaces every occurrence of the match. If there are multiple matches, all matches will be replaced with the pattern name.

Redact processors are compatible with the Elastic Common Architecture (ECS) schema. Legacy Grok patterns are not supported.

Using Redact Processors in Pipelines

Redact options
name Necessary item default
field yes - field to edit
patterns yes - List of grok expressions for matching and editing named captures
pattern_definitions no - A map of schema names and schema tuples, defining the custom schema to be used by the processor. Patterns matching existing names will override pre-existing definitions
prefix no < Use this tag to start editing a section
suffix no > Use this tag to end the editing section
ignore_missing no true If true and the field does not exist or is null, the processor exits quietly without modifying the document
description no - A description of the processor. Useful for purposes describing a processor or its configuration.
if  no - Conditionally execute a handler. See Running Processors Conditionally .
ignore_failure no false Ignore processor failures. See Handling Pipeline Failures .
on_failure no -

 Ignore processor failures. See Handling Pipeline Failures .

tag no - The processor's identifier. Useful for debugging and metrics.

example

In this example, the predefined IP Grok patterns are used to match and edit IP addresses in the message text field. Use the Simulate API to test the pipeline.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description" : "Hide my IP",
    "processors": [
      {
        "redact": {
          "field": "message",
          "patterns": ["%{IP:client}"]
        }
      }
    ]
  },
  "docs":[
    {
      "_source": {
        "message": "55.3.244.1 GET /index.html 15824 0.043"
      }
    }
  ]
}

The result displayed by the above command is:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "message": "<client> GET /index.html 15824 0.043"
        },
        "_ingest": {
          "timestamp": "2023-06-24T01:53:44.906188Z"
        }
      }
    }
  ]
}

The document in the response still contains the message field, but now the IP address 55.3.244.1 is replaced by the text <client>.

The IP address is replaced with the word client, since that is what is specified in the Grok pattern %{IP:client}. The < and > tags around schema names can be configured using the prefix and suffix options.

The next example defines multiple patterns, both of which are replaced by the word REDACTED, with the prefix and suffix flags set to *

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "Hide my IP",
    "processors": [
      {
        "redact": {
          "field": "message",
          "patterns": [
            "%{IP:REDACTED}",
            "%{EMAILADDRESS:REDACTED}"
          ],
          "prefix": "*",
          "suffix": "*"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "55.3.244.1 GET /index.html 15824 0.043 [email protected]"
      }
    }
  ]
}

In the response, both the IP 55.3.244.1 and the email address [email protected] have been replaced with *REDACTED*.

The result of running the above command is:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "message": "*REDACTED* GET /index.html 15824 0.043 *REDACTED*"
        },
        "_ingest": {
          "timestamp": "2023-06-24T01:56:07.547294Z"
        }
      }
    }
  ]
}

Custom patterns

If one of the existing Grok patterns doesn't meet your requirements, you can add custom patterns using the pattern_definitions option. A new schema definition consists of the schema name and the schema itself. The pattern can be a regular expression or reference an existing Grok pattern.

This example defines a custom pattern GITHUB_NAME to match GitHub usernames. The schema definition uses the existing USERNAME Grok schema, prefixed with the literal @.

Tip : The Grok debugger is a very useful tool for building custom patterns.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "redact": {
          "field": "message",
          "patterns": [
            "%{GITHUB_NAME:GITHUB_NAME}"
          ],
          "pattern_definitions": {
            "GITHUB_NAME": "@%{USERNAME}"
          }
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "@elastic-data-management the PR is ready for review"
      }
    }
  ]
}

username is redacted in the response.

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "message": "<GITHUB_NAME> the PR is ready for review"
        },
        "_ingest": {
          "timestamp": "2023-06-24T01:59:15.427469Z"
        }
      }
    }
  ]
}

Grok watchdog

Watchdog breaks expressions that take too long to execute. When interrupted, the Redact handler fails with an error. The same settings that control Grok Watchdog timeouts also apply to Redact processors.

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/131358130