Debezium Series: Content-Based Routing

1. The concept of content-based routing

By default, Debezium streams all change events read from tables to a single static topic. However, in some cases it may be desirable to reroute selected events to other topics based on the event content. The process of routing messages based on content is described in Content-Based Routing Messaging Patterns. To apply this pattern in Debezium, you can use content-based routing Single Message Transformation (SMT) to write expressions that are evaluated for each event.

Depending on how the event is evaluated, SMT routes the event message to the original target topic, or reroutes it to the topic specified in the expression.

Notice:

  • Content-based routing SMT is under active development. The structure or other details of the messages emitted may change as development progresses.

While it is possible to create custom SMTs in Java to code routing logic, using custom coded SMTs has its drawbacks. For example:

  • It is necessary to pre-compile the transformation and deploy it to Kafka Connect.
  • Every change required recompilation and redeployment of the code, resulting in inflexible operations.

Content-Based Routing SMT supports scripting languages ​​integrated with JSR 223 (Java™ Platform Scripting).

Debezium does not ship with any implementation of the JSR 223 API. To use the expression language in Debezium, the language's JSR 223 scripting engine implementation must be downloaded. For example, for Groovy 3, you can download its JSR 223 implementation from https://groovy-lang.org/. The JSR223 implementation of GraalVM JavaScript is available at https://github.com/graalvm/graaljs. Once you have the script engine files, add them to the Debezium connector plugin directory, along with any other JAR files used by the language implementation.

2. Deployment

  • For security reasons, content-based routing SMT is not included in the Debezium connector tarball.
  • Instead, it is provided in a separate artifact debezium-scripting-2.2.1.Final.tar.gz.

To use content-based routing SMT with the Debezium connector plugin, you must explicitly add the SMT artifact to your Kafka Connect environment. Important: After a routing SMT is present in a Kafka Connect instance, any user allowed to add connectors to that instance can run script expressions. To ensure that script expressions can only be run by authorized users, make sure to secure the Kafka Connect instance and its configuration interface before adding the routing SMT.

After installing Zookeeper, Kafka, Kafka Connect, and one or more Debezium connectors, the remaining tasks to install the filter SMT are:

  1. Download script SMT archive
  2. Extract the contents of the archive into the Debezium plugin directory of the Kafka Connect environment.
  3. Get the JSR-223 scripting engine implementation and add its contents to the Debezium plugin directory of the Kafka Connect environment.
  4. Restart the Kafka Connect process to pick up the new JAR file.

The Groovy language requires the following libraries on the classpath:

  • groovy
  • groovy-json (optional)
  • groovy-jsr223

The JavaScript language requires the following libraries in the classpath:

  • graalvm.js
  • graalvm.js.scriptengine

3. Example: basic configuration

To configure a Debezium connector to route change event records based on event content, you can configure a ContentBasedRouter SMT in the connector's Kafka Connect configuration.

Configuring content-based routing SMT requires specifying regular expressions that define filter conditions. In the configuration, create a regular expression that defines the routing conditions. This expression defines the pattern for evaluating event records. It also specifies the name of the target topic to which events matching the pattern are routed. The specified schema may specify an event type, such as a table insert, update, or delete operation. You can also define a pattern to match values ​​in specific columns or rows.

For example, to reroute all update (u) records to an update topic, you would add the following configuration to your connector configuration:

...
transforms=route
transforms.route.type=io.debezium.transforms.ContentBasedRouter
transforms.route.language=jsr223.groovy
transforms.route.topic.expression=value.op == 'u' ? 'updates' : null
...

The preceding examples specify the use of the Groovy expression language.

Records that do not match the pattern are routed to the default topic.

Custom configuration
The preceding example shows a simple SMT configuration designed to handle only DML events that contain action fields. Other types of messages that a connector might emit (heartbeat messages, tombstone messages, or metadata messages about transaction or schema changes) do not include this field. To avoid processing failures, you can define an SMT predicate statement to selectively apply transformations only to specific events.

4. Variables used in content-based routing expressions

Debezium binds certain variables to the evaluation context of SMT. When creating expressions to specify the conditions governing routing targets, SMT can look up and interpret the values ​​of these variables to evaluate the conditions in the expressions.

The following table lists the variables in the evaluation context that Debezium binds to the content-based routing SMT:

Table 1. Content-based routing expression variables

name describe type
key The key of the message. org.apache.kafka.connect​.data​.Struct
value The value of the message. org.apache.kafka.connect​.data​.Struct
keySchema Schema for message keys. org.apache.kafka.connect​.data​.Schema
valueSchema The structure of the message key. org.apache.kafka.connect​.data​.Schema
topic The name of the target topic. String
headers Java mapping of message headers. The key field is the title name. The headers variable exposes the following properties: value (object type) schema (of type org.apache.kafka.connect.data.Schema) java.util.Map​<String,​ io.debezium​.transforms​.scripting​.RecordHeader>

Expressions can call arbitrary methods on their variables. The expression should resolve to a boolean value that determines how the SMT handles the message. Messages are persisted when the routing condition in the expression evaluates to true. When the routing condition evaluates to false, the message will be dropped.

Expressions should not cause any side effects. That is, they should not modify any variables they pass.

5. Option to selectively apply transformations

In addition to the change event messages that the Debezium connector emits when database changes occur, the connector also emits other types of messages, including heartbeat messages and metadata messages about schema changes and transactions. Because the structure of these other messages differs from that of the change event messages that SMT is designed to handle, it is best to configure the connector to apply SMT selectively so that it only processes the expected data change messages. Connectors can be configured to selectively apply SMT using one of the following methods:

  • Configure the SMT verbs for the transformation.
  • Use SMT's topic.regex configuration option.

For more information, please refer to the blogger's technical blog below:

6. Language Details

The way to express content-based routing conditions depends on the scripting language used. For example, as shown in the basic configuration example, when you use Groovy as the expression language, the following expression reroutes all update (u) records to the update topic, while routing other records to the default topic:

value.op == 'u' ? 'updates' : null

Other languages ​​use different methods to express the same condition.

The Debezium MongoDB connector emits after and patch fields as serialized JSON documents rather than structures. To use the ContentBasedRouting SMT with the MongoDB connector, the array fields in the JSON must first be expanded into separate documents.
This can be done by applying the MongoDB ExtractNewDocumentState SMT.

It is also possible to use a JSON parser in an expression to generate a separate output document for each array item.

For example, if using Groovy as your expression language, add the groovy-json artifact to your classpath, then add an expression such as (new groovy.json.JsonSlurper()).parseText(value.after).last_name == 'Kretchmar'.

Javascript
When using JavaScript as the expression language, you can call the Struct#get() method to specify content-based routing conditions, as shown in the following example:

value.get('op') == 'u' ? 'updates' : null

Javascript with Graal.js

When creating content-based routing conditions using JavaScript with Graal.js, the approach used is similar to that used with Groovy. For example:

value.op == 'u' ? 'updates' : null

Seven, configuration options

Attributes Defaults describe
topic.regex An optional regular expression that evaluates the name of the event's target subject to determine whether to apply conditional logic. If the target topic's name matches a value in topic.regex, the transformation applies conditional logic before passing the event to the topic. If the topic name does not match the value in topic.regex, SMT passes the event unchanged to the topic.
language The language in which to write expressions. Must start with jsr223., eg jsr223.groovy or jsr223.graal.js. Debezium only supports bootstrapping via the JSR 223 API ("Java™ Platform Script").
topic.expression The expression to be evaluated for each message. Must evaluate to a String value, where a non-null result reroutes the message to a new topic, and a null value routes the message to the default topic.
null.handling.mode keep Specifies how the transformation handles empty (tombstoned) messages. You can specify one of the following options: Keep: Default, Deliver the message. Lower: Delete the message completely. Evaluation: Apply conditional logic to messages.

Guess you like

Origin blog.csdn.net/zhengzaifeidelushang/article/details/131263530