Debezium Series: Message Filtering

Related technical blogs:

1. Message filtering

By default, Debezium passes every data change event it receives to the Kafka broker. However, in many cases you may only be interested in a subset of the events emitted by the producer. To enable you to process only the records that are relevant to you, Debezium provides the filter Single Message Transformation (SMT).

Notice:

  • Filter SMT is under active development. The structure or other details of the messages emitted may change as development progresses.

While it is possible to create custom SMTs in Java to code filtering logic, using custom coded SMTs has its drawbacks. For example:

  • It is necessary to pre-compile the transformation and deploy it to Kafka Connect.
  • Every change required recompilation and redeployment of the code, resulting in inflexible operations.

Filter SMT supports scripting languages ​​integrated with JSR 223 (Java™ Platform Scripting).

Debezium does not ship with any implementation of the JSR 223 API. To use the expression language in Debezium, you must download the language's JSR 223 scripting engine implementation. For example, for Groovy 3, you can download its JSR 223 implementation from https://groovy-lang.org/. The JSR223 implementation of GraalVM JavaScript is available at https://github.com/graalvm/graaljs. Once you have the script engine files, add them to the Debezium connector plugin directory, along with any other JAR files used by the language implementation.

2. Deployment

For security reasons, the filter SMT is not included in the Debezium connector archive. Instead, it is provided in a separate artifact debezium-scripting-2.2.1.Final.tar.gz.

To use content-based routing SMT with the Debezium connector plugin, you must explicitly add the SMT artifact to your Kafka Connect environment. Important: Once the filter SMT is present in the Kafka Connect instance, any user allowed to add connectors to the instance can run script expressions. To ensure that script expressions can only be run by authorized users, make sure to secure the Kafka Connect instance and its configuration interface before adding the filter SMT.

After installing Zookeeper, Kafka, Kafka Connect, and one or more Debezium connectors, the remaining tasks to install the filter SMT are:

  • Download script SMT archive
  • Extract the contents of the archive into the Debezium plugin directory of the Kafka Connect environment.
  • Get the JSR-223 scripting engine implementation and add its contents to the Debezium plugin directory of the Kafka Connect environment.
  • Restart the Kafka Connect process to pick up the new JAR file.

The Groovy language requires the following libraries on the classpath:

  • groovy
  • groovy-json (optional)
  • groovy-jsr223

The JavaScript language requires the following libraries in the classpath:

  • graalvm.js
  • graalvm.js.scriptengine

3. Example: basic configuration

You can configure filter transformations in the Debezium connector's Kafka Connect configuration. In configuration, you specify the events you are interested in by defining filter conditions based on business rules. When a filter SMT processes a stream of events, it evaluates each event against the configured filter criteria. Only events that meet the filter criteria are passed to the agent.

To configure the Debezium connector to filter change event records, configure the filter SMT in the Kafka Connect configuration of the Debezium connector. The configuration of the filter SMT requires you to specify regular expressions that define filter conditions.

For example, you can add the following configuration in your connector configuration.

transforms=filter
transforms.filter.type=io.debezium.transforms.Filter
transforms.filter.language=jsr223.groovy
transforms.filter.condition=value.op == 'u' && value.before.id == 2

The preceding examples specify the use of the Groovy expression language. The regular expression value.op == 'u' && value.before.id == 2 deletes all messages except those representing update (u) records with an id value equal to 2.

Custom configuration
The preceding example shows a simple SMT configuration designed to handle only DML events that contain action fields. Other types of messages that a connector might emit (heartbeat messages, tombstone messages, or metadata messages about schema changes and transactions) do not include this field. To avoid processing failures, you can define an SMT predicate statement to selectively apply transformations only to specific events.

4. Variables used in filter expressions

Debezium binds certain variables to the evaluation context of the filter SMT. When you create expressions to specify filter conditions, you can use Debezium to bind to variables in the evaluation context. By binding variables, Debezium enables SMT to find and interpret their values ​​when evaluating conditions in expressions.

The following table lists the variables that Debezium binds to the evaluation context of the filter SMT:
Table 1. Filter expression variables

Name Description Type
key message key org.apache.kafka.connect​.data​.Struct
value The value of the message. org.apache.kafka.connect.data​.Struct
keySchema Schema for message keys. org.apache.kafka.connect​.data​.Schema
valueSchema Schema for message values. org.apache.kafka.connect​.data​.Schema
topic The name of the target topic. String
headers Java mapping of message headers. The key field is the title name. The headers variable exposes the following properties: value (object type), schema (of type org.apache.kafka.connect.data.Schema) java.util.Map​<String, ​io.debezium.transforms​.scripting​.RecordHeader>

Expressions can call arbitrary methods on their variables. The expression should resolve to a boolean value that determines how the SMT handles the message. Messages are retained when the filter condition in the expression evaluates to true. When the filter condition evaluates to false, the message will be deleted.

Expressions should not cause any side effects. That is, they should not modify any variables they pass.

5. Option to selectively apply transformations

In addition to the change event messages that the Debezium connector emits when database changes occur, the connector also emits other types of messages, including heartbeat messages and metadata messages about schema changes and transactions. Because the structure of these other messages differs from that of the change event messages that SMT is designed to handle, it is best to configure the connector to apply SMT selectively so that it only processes the expected data change messages. You can configure a connector to selectively apply SMT using one of the following methods:

  • Configure the SMT verbs for the transformation.
  • Use SMT's topic.regex configuration option.

For more content, please refer to the following blog post of the blogger:

6. Language Details

The way you express your filter depends on the scripting language you use.

For example, as shown in the basic configuration example, when you use Groovy as the expression language, the following expression deletes all messages except for updated records with an id value set to 2:

value.op == 'u' && value.before.id == 2

Other languages ​​use different methods to express the same condition.

Note:
The Debezium MongoDB connector emits after and patch fields as serialized JSON documents rather than structures.
To use the filter SMT with the MongoDB connector, you must first expand the array fields in the JSON into separate documents.
This can be done by applying the MongoDB ExtractNewDocumentState SMT.

It is also possible to use a JSON parser in an expression to generate a separate output document for each array item.
For example, if using Groovy as your expression language, add the groovy-json artifact to your classpath, then add an expression such as (new groovy.json.JsonSlurper()).parseText(value.after).last_name == 'Kretchmar'.

Javascript
If you use JavaScript as the expression language, you can call the Struct#get() method to specify the filter conditions, as in the following example:

value.get('op') == 'u' && value.get('before').get('id') == 2

Javascript with Graal.js
If you use JavaScript with Graal.js to define filter conditions, you use a method similar to the method you use in Groovy. For example:

value.op == 'u' && value.before.id == 2

Seven, configuration options

The following table lists the configuration options available for filter SMT.

Table 2. Filter SMT configuration options

Attributes Defaults describe
topic.regex An optional regular expression used to evaluate the event's target topic name to determine whether to apply filtering logic. If the target topic's name matches a value in topic.regex, the transformation applies filter logic before passing the event to the topic. If the topic name does not match the value in topic.regex, SMT passes the event unchanged to the topic.
language The language in which to write expressions. Must start with jsr223., eg jsr223.groovy or jsr223.graal.js. Debezium only supports bootstrapping via the JSR 223 API ("Java™ Platform Script").
condition The expression to be evaluated for each message. Must evaluate to a boolean value, where a true result retains the message and a false result deletes it.
null.handling.mode keep Specifies how the transformation handles empty (tombstoned) messages. You can specify one of the following options: Keep: (default) to deliver the message. Lower: Delete the message completely. Rating: Apply filter criteria to messages.

Guess you like

Origin blog.csdn.net/zhengzaifeidelushang/article/details/131270122