need
When Apache NiFi is used to distribute enterprise master data to downstream business systems in real time, the downstream systems include MySQL, PostgreSQL, Oracle and other business systems. Among them, NiFi does not directly support Oracle Upsert semantics, which leads to insert-error when a large number of master data such as products and materials are updated. The -update method distributes Orace data to downstream, but the performance is insufficient. Therefore, we plan to implement the Upsert function by customizing the Processor and using Oracle's built-in Merge into syntax.
In Apache NiFi Custom Processor ,
the process of setting up the Apache NiFi custom development Processor environment is introduced. This article introduces how to realize NiFi's support for Oracle's built-in Merge into function by modifying the PutDatabaseRecord.java class in the NiFi source code.
Development environment preparation
Please refer to the steps introduced in Apache NiFi Custom Processor in detail to complete the development environment setup.
Custom Oracle upsert Processor
VS Code View NiFi source code
The author likes to use IDEA as a Java development IDE, and VS Code as a simple code retrieval or data warehouse, front-end, Rust and other development tools. The startup speed of VS Code feels much faster than IDEA.
By exploring the NiFi source code, find the PutDatabaseRecord.java file.
- Github download NiFi latest code
- By exploring the NiFi source code, find the PutDatabaseRecord.java file
Build IDEA development environment
Step 1: Comment out the entire MyProcessor.java class file
Step 2: Copy PutDatabaseRecord.java to the IDEA development environment
After directly copying the source code into IDEA, there may be errors as shown in the picture above. The correction method is as follows:
- In the main project pom file, 1.4.0 needs to be modified to 15.3
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-nar-bundles</artifactId>
<version>1.15.3</version>
</parent>
<groupId>com</groupId>
<artifactId>cvte</artifactId>
<version>1.0</version>
<packaging>pom</packaging>
<modules>
<module>nifi-cvte-processors</module>
<module>nifi-cvte-nar</module>
</modules>
</project>
- processor sub-project pom, introduce dependency packages
<!-- DB控件服务相关包,数据库链接之类的配置-->
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-standard-services-api-nar</artifactId>
<version>1.15.3</version>
<type>nar</type>
</dependency>
<!-- PutDatabaseRecord相关包 注释掉以下包,重写 DatabaseAdapter类,否则会生成1.0版本的标准Processor,详见源码-->
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-standard-processors</artifactId>
<version>1.15.3</version>
<scope>provided</scope>
</dependency>
<!-- org.apache.nifi.serialization.RecordReaderFactory -->
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-record-serialization-service-api</artifactId>
<version>1.15.3</version>
<scope>provided</scope>
</dependency>
Step 3: Modify the class name
To avoid the conflict between the custom PutDatabaseRecord class file name and the NiFi standard Processor class name, use Idea's rename to modify the class name to: PutOracleDatabaseRecordMerge. After modification, you need to modify the class name referenced in the resource file.
Add operation database Adaper class
- Copy the NiFi source code DatabaseAdapter and modify the class name to CvteDatabaseAdapter without modifying the code content.
- Copy the NiFi source code PostgreSQLDatabaseAdapter and modify the class name to CvteOracleDatabaseAdapter. The modified code is as follows:
Modify PutDatabaseRecordMerge code
Modify Meta-info
https://github.com/dawsongzhao1104/nifi/tree/main/nifi-processorhttps://github.com/dawsongzhao1104/nifi/tree/main/nifi-processor
Step 4: Package and run
⚠️Problem: A large number of standard components have added version 1.0. This seems to be introduced by us. Check out the PutDatabaseRecordMerge.nar package we compiled, which is larger than 58MB. The standard processor nar package is about 12MB. Guess, it should be caused by our packaging tool copying the standard Processor into its own Processor.
- To solve the problem of adding version 1.0 of standard components
, comment outnifi-standard-processors
the dependent Jar package. Override the DatabaseAdapter class
Verification Test
Configure database service
Configure JsonTreeReader
Configure JsonRecordSetWriter
ConfigureExecuteSQLRecord
Configure verification environment
References
[1] https://community.cloudera.com/t5/Community-Articles/Build-Custom-Nifi-Processor/ta-p/244734
[2] https://www.youtube.com/watch?v=v2u0WsPs2Ac
Summarize
nifi-standard-processors
This article explains how to implement Oracle merge by rewriting the PutDatabaseRecord source code, and explores the standard Processor problem of generating a custom version number by rewriting the DatabaseAdapter and commenting out dependencies to circumvent it.
For any questions in this article, please feel free to add Q for discussion: 568072887