Rewrite NiFi PutDatabaseRecord to implement Oracle Merge/Upsert

need

When Apache NiFi is used to distribute enterprise master data to downstream business systems in real time, the downstream systems include MySQL, PostgreSQL, Oracle and other business systems. Among them, NiFi does not directly support Oracle Upsert semantics, which leads to insert-error when a large number of master data such as products and materials are updated. The -update method distributes Orace data to downstream, but the performance is insufficient. Therefore, we plan to implement the Upsert function by customizing the Processor and using Oracle's built-in Merge into syntax.

In Apache NiFi Custom Processor ,
the process of setting up the Apache NiFi custom development Processor environment is introduced. This article introduces how to realize NiFi's support for Oracle's built-in Merge into function by modifying the PutDatabaseRecord.java class in the NiFi source code.

Development environment preparation

Please refer to the steps introduced in Apache NiFi Custom Processor in detail to complete the development environment setup.

Custom Oracle upsert Processor

VS Code View NiFi source code

The author likes to use IDEA as a Java development IDE, and VS Code as a simple code retrieval or data warehouse, front-end, Rust and other development tools. The startup speed of VS Code feels much faster than IDEA.
By exploring the NiFi source code, find the PutDatabaseRecord.java file.

  • Github download NiFi latest code
  • By exploring the NiFi source code, find the PutDatabaseRecord.java file

Build IDEA development environment

Step 1: Comment out the entire MyProcessor.java class file

Step 2: Copy PutDatabaseRecord.java to the IDEA development environment

After directly copying the source code into IDEA, there may be errors as shown in the picture above. The correction method is as follows:

  • In the main project pom file, 1.4.0 needs to be modified to 15.3
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-nar-bundles</artifactId>
        <version>1.15.3</version>
    </parent>

    <groupId>com</groupId>
    <artifactId>cvte</artifactId>
    <version>1.0</version>
    <packaging>pom</packaging>

    <modules>
        <module>nifi-cvte-processors</module>
        <module>nifi-cvte-nar</module>
    </modules>

</project>
  • processor sub-project pom, introduce dependency packages
<!--        DB控件服务相关包,数据库链接之类的配置-->
        <dependency>
            <groupId>org.apache.nifi</groupId>
            <artifactId>nifi-standard-services-api-nar</artifactId>
            <version>1.15.3</version>
            <type>nar</type>
        </dependency>

<!--        PutDatabaseRecord相关包 注释掉以下包,重写 DatabaseAdapter类,否则会生成1.0版本的标准Processor,详见源码-->
        <dependency>
            <groupId>org.apache.nifi</groupId>
            <artifactId>nifi-standard-processors</artifactId>
            <version>1.15.3</version>
            <scope>provided</scope>
        </dependency>

<!--        org.apache.nifi.serialization.RecordReaderFactory -->
        <dependency>
            <groupId>org.apache.nifi</groupId>
            <artifactId>nifi-record-serialization-service-api</artifactId>
            <version>1.15.3</version>
            <scope>provided</scope>
        </dependency>

Step 3: Modify the class name

To avoid the conflict between the custom PutDatabaseRecord class file name and the NiFi standard Processor class name, use Idea's rename to modify the class name to: PutOracleDatabaseRecordMerge. After modification, you need to modify the class name referenced in the resource file.

Add operation database Adaper class

  • Copy the NiFi source code DatabaseAdapter and modify the class name to CvteDatabaseAdapter without modifying the code content.
  • Copy the NiFi source code PostgreSQLDatabaseAdapter and modify the class name to CvteOracleDatabaseAdapter. The modified code is as follows:

Modify PutDatabaseRecordMerge code

Modify Meta-info

https://github.com/dawsongzhao1104/nifi/tree/main/nifi-processorhttps://github.com/dawsongzhao1104/nifi/tree/main/nifi-processor

Step 4: Package and run

⚠️Problem: A large number of standard components have added version 1.0. This seems to be introduced by us. Check out the PutDatabaseRecordMerge.nar package we compiled, which is larger than 58MB. The standard processor nar package is about 12MB. Guess, it should be caused by our packaging tool copying the standard Processor into its own Processor.

  • To solve the problem of adding version 1.0 of standard components
    , comment out nifi-standard-processorsthe dependent Jar package. Override the DatabaseAdapter class

Verification Test

Configure database service

Configure JsonTreeReader

Configure JsonRecordSetWriter

ConfigureExecuteSQLRecord

Configure verification environment

References

[1] https://community.cloudera.com/t5/Community-Articles/Build-Custom-Nifi-Processor/ta-p/244734

[2] https://www.youtube.com/watch?v=v2u0WsPs2Ac

Summarize

nifi-standard-processorsThis article explains how to implement Oracle merge by rewriting the PutDatabaseRecord source code, and explores the standard Processor problem of generating a custom version number by rewriting the DatabaseAdapter and commenting out dependencies to circumvent it.

For any questions in this article, please feel free to add Q for discussion: 568072887

Guess you like

Origin blog.csdn.net/zdsx1104/article/details/124418818