Kafka Series: Custom Conversion Transformation

Kafka Series: Custom Conversion Transformation

1. Custom conversion

If none of the available Single Message Transformations (SMTs) provide the necessary transformation, you can create your own.

An important concept to understand first is that, typically, SMT implementations provide most of the logic in abstract classes. The SMT implementation then provides two concrete subclasses, called Key and Value, which specify whether to process the key or value of the Connect record. When using transformations, the user specifies the fully qualified class name of the Key or Value class.

The following are the high-level steps required to create and use a custom SMT.

1. Check out the different SMT source java files available in the default Kafka Connect transformation. Use one of these as the basis for creating new custom transformations.

The following are important methods to be aware of when viewing java source files:

  • Search for apply() to see how this method is implemented.
  • Search for configure() to see how this method is implemented.

2. Write and compile source code and unit tests. Example unit tests for SMT can be found in the Apache Kafka GitHub project.

3. Create your JAR file.

4. Install the JAR file. Copy the custom SMT JAR file (and any non-Kafka JAR files needed for conversion) into a directory under one of the directories listed in the plugin.path property in the Connect Worker configuration file, as follows:

plugin.path=/usr/local/share/kafka/plugins

For example, create a directory named my-custom-smt under /usr/local/share/kafka/plugins and copy the JAR file into the my-custom-smt directory.

Make sure to do this on all worker nodes.

Start the workers and connectors, and try your custom transformations.

The Connect worker logs every transform class it finds at DEBUG level. Enable DEBUG mode and verify that your transform is found. If not, check your JAR installation and make sure it's in the correct location.

Two, Transformation example

This code is a HeaderToValueclass named , which implements the Kafka Connect Transformationinterface, and is used to convert the Header information in the Kafka message into a part of the message body.

  • First, many constants and enumeration types are defined, including the Header name and field name to be processed, the type of operation to be performed (move or copy), and so on.
  • Then some configuration information is defined, including three fields headers, , fieldsand operation, and some attribute verification and description are performed on them.
  • In the class definition, the config, configureand applymethods are overloaded. Among them, configthe method returns an ConfigDefobject, which is used to specify the configuration information of this class; configurethe method reads and verifies the configuration information, and initializes some internal states; applythe method is the real conversion logic, which is used to convert the header information in the Kafka message into the message body , and returns the modified message.
  • In addition, some auxiliary methods are defined, including creating a new Schema based on Header information and field information, creating a new message body based on Header information and field information, and some auxiliary functions for log output and debugging.

A namespace definition, declaring that this class accepts the namespace of the Debezium framework.

package io.debezium.transforms;

This code is to introduce the classes, interfaces, enumeration types that need to be used, as well as the static methods and variables that need to be imported.

import static io.debezium.transforms.HeaderToValue.Operation.MOVE;
import static java.lang.String.format;
import static org.apache.kafka.connect.transforms.util.Requirements.requireStruct;

import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.function.Function;
import java.util.stream.Collectors;
import java.util.stream.StreamSupport;

import org.apache.kafka.common.config.ConfigDef;
import org.apache.kafka.common.config.ConfigException;
import org.apache.kafka.connect.connector.ConnectRecord;
import org.apache.kafka.connect.data.Schema;
import org.apache.kafka.connect.data.SchemaBuilder;
import org.apache.kafka.connect.data.Struct;
import org.apache.kafka.connect.header.Header;
import org.apache.kafka.connect.header.Headers;
import org.apache.kafka.connect.transforms.Transformation;
import org.apache.kafka.connect.transforms.util.SchemaUtil;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import io.debezium.config.Configuration;
import io.debezium.config.Field;
import io.debezium.util.BoundedConcurrentHashMap;

This code defines a HeaderToValueclass called , which implements the Kafka Connect Transformationinterface.

public class HeaderToValue<R extends ConnectRecord<R>> implements Transformation<R>

This code defines some static constants, enumeration types, and some field variables for storing and reading configurations.

private static final Logger LOGGER = LoggerFactory.getLogger(HeaderToValue.class);
public static final String FIELDS_CONF = "fields";
public static final String HEADERS_CONF = "headers";
public static final String OPERATION_CONF = "operation";
private static final String MOVE_OPERATION = "move";
private static final String COPY_OPERATION = "copy";
private static final int CACHE_SIZE = 64;
public static final String NESTING_SEPARATOR = ".";
public static final String ROOT_FIELD_NAME = "payload";

enum Operation {
    
    
    MOVE(MOVE_OPERATION),
    COPY(COPY_OPERATION);

    private final String name;

    Operation(String name) {
    
    
        this.name = name;
    }

    static Operation fromName(String name) {
    
    
        switch (name) {
    
    
            case MOVE_OPERATION:
                return MOVE;
            case COPY_OPERATION:
                return COPY;
            default:
                throw new IllegalArgumentException();
        }
    }

    public String toString() {
    
    
        return name;
    }
}

public static final Field HEADERS_FIELD = Field.create(HEADERS_CONF)
        .withDisplayName("Header names list")
        .withType(ConfigDef.Type.LIST)
        .withImportance(ConfigDef.Importance.HIGH)
        .withValidation(
                Field::notContainSpaceInAnyElement,
                Field::notContainEmptyElements)
        .withDescription("Header names in the record whose values are to be copied or moved to record value.")
        .required();

public static final Field FIELDS_FIELD = Field.create(FIELDS_CONF)
        .withDisplayName("Field names list")
        .withType(ConfigDef.Type.LIST)
        .withImportance(ConfigDef.Importance.HIGH)
        .withValidation(
                Field::notContainSpaceInAnyElement,
                Field::notContainEmptyElements)
        .withDescription(
                "Field names, in the same order as the header names listed in the headers configuration property. Supports Struct nesting using dot notation.")
        .required();

public static final Field OPERATION_FIELD = Field.create(OPERATION_CONF)
        .withDisplayName("Operation: mover or copy")
        .withType(ConfigDef.Type.STRING)
        .withEnum(Operation.class)
        .withImportance(ConfigDef.Importance.HIGH)
        .withDescription("Either <code>move</code> if the fields are to be moved to the value (removed from the headers), " +
                "or <code>copy</code> if the fields are to be copied to the value (retained in the headers).")
        .required();

private List<String> fields;

private List<String> headers;

private Operation operation;

private final BoundedConcurrentHashMap<Schema, Schema> schemaUpdateCache = new BoundedConcurrentHashMap<>(CACHE_SIZE);
private final BoundedConcurrentHashMap<Headers, Headers> headersUpdateCache = new BoundedConcurrentHashMap<>(CACHE_SIZE);

This code implements the and methods Transformationin the interface to process the configuration information of this class. Among them, the method returns an object, which is used to specify the configuration information of the class; the method reads and verifies the configuration information, and initializes some internal states.configconfigureconfigConfigDefconfigure

@Override
public ConfigDef config() {
    
    

    final ConfigDef config = new ConfigDef();
    Field.group(config, null, HEADERS_FIELD, FIELDS_FIELD, OPERATION_FIELD);
    return config;
}

@Override
public void configure(Map<String, ?> props) {
    
    

    final Configuration config = Configuration.from(props);
    SmtManager<R> smtManager = new SmtManager<>(config);
    smtManager.validate(config, Field.setOf(FIELDS_FIELD, HEADERS_FIELD, OPERATION_FIELD));

    fields = config.getList(FIELDS_FIELD);
    headers = config.getList(HEADERS_FIELD);

    validateConfiguration();

    operation = Operation.fromName(config.getString(OPERATION_FIELD));
}

private void validateConfiguration() {
    
    

    if (headers.size() != fields.size()) {
    
    
        throw new ConfigException(format("'%s' config must have the same number of elements as '%s' config.",
                FIELDS_FIELD, HEADERS_FIELD));
    }
}

This part of the code implements the method Transformationin the interface apply, which is used to convert the input record and return the converted record.

In applythe method, the code will first extract the header information that needs to be processed, and use the given configuration to modify the value object or generate a new record. Finally, the method returns the transformed record. Throughout the process, some auxiliary methods are involved, such as removeHeaders, isContainedIn, makeNewSchemaand so on.

@Override
    public R apply(R record) {
    
    

        final Struct value = requireStruct(record.value(), "Header field insertion");

        LOGGER.trace("Processing record {}", value);
        Map<String, Header> headerToProcess = StreamSupport.stream(record.headers().spliterator(), false)
                .filter(header -> headers.contains(header.key()))
                .collect(Collectors.toMap(Header::key, Function.identity()));

        if (LOGGER.isTraceEnabled()) {
    
    
            LOGGER.trace("Header to be processed: {}", headersToString(headerToProcess));
        }

        if (headerToProcess.isEmpty()) {
    
    
            return record;
        }

        Schema updatedSchema = schemaUpdateCache.computeIfAbsent(value.schema(), valueSchema -> makeNewSchema(valueSchema, headerToProcess));

        LOGGER.trace("Updated schema fields: {}", updatedSchema.fields());

        Struct updatedValue = makeUpdatedValue(value, headerToProcess, updatedSchema);

        LOGGER.trace("Updated value: {}", updatedValue);

        Headers updatedHeaders = record.headers();
        if (MOVE.equals(operation)) {
    
    
            updatedHeaders = headersUpdateCache.computeIfAbsent(record.headers(), this::removeHeaders);
        }

        return record.newRecord(
                record.topic(),
                record.kafkaPartition(),
                record.keySchema(),
                record.key(),
                updatedSchema,
                updatedValue,
                record.timestamp(),
                updatedHeaders);
    }

    private Headers removeHeaders(Headers originalHeaders) {
    
    

        Headers updatedHeaders = originalHeaders.duplicate();
        headers.forEach(updatedHeaders::remove);

        return updatedHeaders;
    }

    private Struct makeUpdatedValue(Struct originalValue, Map<String, Header> headerToProcess, Schema updatedSchema) {
    
    

        List<String> nestedFields = fields.stream().filter(field -> field.contains(NESTING_SEPARATOR)).collect(Collectors.toList());

        return buildUpdatedValue(ROOT_FIELD_NAME, originalValue, headerToProcess, updatedSchema, nestedFields, 0);
    }

    private Struct buildUpdatedValue(String fieldName, Struct originalValue, Map<String, Header> headerToProcess, Schema updatedSchema, List<String> nestedFields,
                                     int level) {
    
    

        Struct updatedValue = new Struct(updatedSchema);
        for (org.apache.kafka.connect.data.Field field : originalValue.schema().fields()) {
    
    
            if (originalValue.get(field) != null) {
    
    
                if (isContainedIn(field.name(), nestedFields)) {
    
    
                    Struct nestedField = requireStruct(originalValue.get(field), "Nested field");
                    updatedValue.put(field.name(),
                            buildUpdatedValue(field.name(), nestedField, headerToProcess, updatedSchema.field(field.name()).schema(), nestedFields, ++level));
                }
                else {
    
    
                    updatedValue.put(field.name(), originalValue.get(field));
                }
            }
        }

        for (int i = 0; i < headers.size(); i++) {
    
    

            Header currentHeader = headerToProcess.get(headers.get(i));

            if (currentHeader != null) {
    
    
                Optional<String> fieldNameToAdd = getFieldName(fields.get(i), fieldName, level);
                fieldNameToAdd.ifPresent(s -> updatedValue.put(s, currentHeader.value()));
            }
        }

        return updatedValue;
    }

    private boolean isContainedIn(String fieldName, List<String> nestedFields) {
    
    

        return nestedFields.stream().anyMatch(s -> s.contains(fieldName));
    }

    private Schema makeNewSchema(Schema oldSchema, Map<String, Header> headerToProcess) {
    
    

        List<String> nestedFields = fields.stream().filter(field -> field.contains(NESTING_SEPARATOR)).collect(Collectors.toList());

        return buildNewSchema(ROOT_FIELD_NAME, oldSchema, headerToProcess, nestedFields, 0);
    }

    private Schema buildNewSchema(String fieldName, Schema oldSchema, Map<String, Header> headerToProcess, List<String> nestedFields, int level) {
    
    

        if (oldSchema.type().isPrimitive()) {
    
    
            return oldSchema;
        }

        // Get fields from original schema
        SchemaBuilder newSchemabuilder = SchemaUtil.copySchemaBasics(oldSchema, SchemaBuilder.struct());
        for (org.apache.kafka.connect.data.Field field : oldSchema.fields()) {
    
    
            if (isContainedIn(field.name(), nestedFields)) {
    
    

                newSchemabuilder.field(field.name(), buildNewSchema(field.name(), field.schema(), headerToProcess, nestedFields, ++level));
            }
            else {
    
    
                newSchemabuilder.field(field.name(), field.schema());
            }
        }

        LOGGER.debug("Fields copied from the old schema {}", newSchemabuilder.fields());
        for (int i = 0; i < headers.size(); i++) {
    
    

            Header currentHeader = headerToProcess.get(headers.get(i));
            Optional<String> currentFieldName = getFieldName(fields.get(i), fieldName, level);
            LOGGER.trace("CurrentHeader {} - currentFieldName {}", headers.get(i), currentFieldName);
            if (currentFieldName.isPresent() && currentHeader != null) {
    
    
                newSchemabuilder = newSchemabuilder.field(currentFieldName.get(), currentHeader.schema());
            }
        }
        LOGGER.debug("Fields added from headers {}", newSchemabuilder.fields());
        return newSchemabuilder.build();
    }

    private Optional<String> getFieldName(String destinationFieldName, String fieldName, int level) {
    
    

        String[] nestedNames = destinationFieldName.split("\\.");
        if (isRootField(fieldName, nestedNames)) {
    
    
            return Optional.of(nestedNames[0]);
        }

        if (isChildrenOf(fieldName, level, nestedNames)) {
    
    
            return Optional.of(nestedNames[level]);
        }

        return Optional.empty();
    }

    private static boolean isChildrenOf(String fieldName, int level, String[] nestedNames) {
    
    
        int parentLevel = level == 0 ? 0 : level - 1;
        return nestedNames[parentLevel].equals(fieldName);
    }

    private static boolean isRootField(String fieldName, String[] nestedNames) {
    
    
        return nestedNames.length == 1 && fieldName.equals(ROOT_FIELD_NAME);
    }

    private String headersToString(Map<?, ?> map) {
    
    
        return map.keySet().stream()
                .map(key -> key + "=" + map.get(key))
                .collect(Collectors.joining(", ", "{", "}"));
    }

    @Override
    public void close() {
    
    
    }
}


Guess you like

Origin blog.csdn.net/zhengzaifeidelushang/article/details/131333051