DataX secondary development supports Oracle update data

Previous review:
"DataX and DataX-Web Installation and Use Detailed Explanation"
"DataX Source Code Debugging and Packaging"
"DataX-Web Source Code Debugging and Packaging"

At present, many mainstream databases support the on duplicate key update (update data when the primary key conflicts) mode, and DataX also supports configuring the write mode by configuring writeMode. But currently Oracle only supports insert configuration items .

How to adapt to the Oracle database on duplicate key update mode, today Daxiechao will take you for secondary development.


1. Principle

The bottom layer of the insert, replace, and update configuration items of writeMode uses INSERT INTOREPLACE INTO/INSERT INTO … ON DUPLICATE KEY UPDATEthe statement:

Among them, insert into will not write the conflicting rows when the primary key/unique index conflicts; the latter two have the same behavior as insert into when there is no primary key/unique index conflict, and will replace the original row with a new row when encountering a conflict all fields.

oracle does not support MySQL-like REPLACE INTOand INSERT … ON DUPLICATE KEY UPDATE, so only insert configuration items are supported. To realize this function, you need to use Oracle's merge statement, let's look at the merge syntax first.

MERGE INTO [target-table] A USING [source-table sql] B 
ON([conditional expression] and [...]...) 
WHEN MATCHED THEN
 [UPDATE sql] 
WHEN NOT MATCHED THEN 
 [INSERT sql]

The merge syntax is actually updating if it exists, and inserting if it does not exist.

Example:

MERGE INTO USERS A USING ( SELECT 18 AS "ID",'chaodev' AS "USER_ID" FROM DUAL ) TMP 
ON (TMP."ID" = A."ID" AND TMP."USER_ID" = A."USER_ID" ) 
WHEN MATCHED THEN 
UPDATE SET "USER_NAME" = '大佬超',"USER_PHONE" = '18000000000',"LASTUPDATETIME" = SYSDATE 
WHEN NOT MATCHED THEN 
INSERT ("ID","USER_ID","USER_NAME","USER_PHONE","LASTUPDATETIME") VALUES(18,'chaodev','大佬超','18000000000',SYSDATE)

So the final implementation principle is: change the oraclewriter source code of datax, and realize UPSERT semantics through the merge into statement.


2. Source code modification

The classes and methods involved in modification are as follows:

oraclewriter package :

com.alibaba.datax.plugin.writer.oraclewriter.OracleWriter: Modified to allow user to configure writeMode.

plugin-dbms-util package :

com.alibaba.datax.plugin.rdbms.writer.util.WriterUtil: Increase the logic code of oralce.

com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter: CommonRdbmsWriter.Task class replaces startWriteWithConnection(), doBatchInsert() and fillPreparedStatement() methods.


2.1 OracleWriter annotation restrictions on writeMode

insert image description here


2.2 WriterUtil, add oracle logic

insert image description here

Add the logic code of oracle, as follows

/**
* 新增oracle update模块
* @author 程序员大佬超
* @date 20221202
* @param columnHolders
* @param valueHolders
* @param writeMode
* @param dataBaseType
* @param forceUseUpdate
* @return
*/
public static String getWriteTemplate(List<String> columnHolders, List<String> valueHolders,
                                      String writeMode, DataBaseType dataBaseType, boolean forceUseUpdate)
{
    
    
    String mode = writeMode.trim().toLowerCase();
    boolean isWriteModeLegal = mode.startsWith("insert") || mode.startsWith("replace") || mode.startsWith("update");

    if (!isWriteModeLegal) {
    
    
        throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,
                                              String.format("您所配置的 writeMode:%s 错误. 因为DataX 目前仅支持replace,update 或 insert 方式. 请检查您的配置并作出修改.", writeMode));
    }
    String writeDataSqlTemplate;
    if (forceUseUpdate || mode.startsWith("update")) {
    
    
        if (dataBaseType == DataBaseType.MySql || dataBaseType == DataBaseType.Tddl) {
    
    
            writeDataSqlTemplate = new StringBuilder()
                .append("INSERT INTO %s (").append(StringUtils.join(columnHolders, ","))
                .append(") VALUES(").append(StringUtils.join(valueHolders, ","))
                .append(")")
                .append(onDuplicateKeyUpdateString(columnHolders))
                .toString();
        }
        else if (dataBaseType == DataBaseType.Oracle) {
    
    
            writeDataSqlTemplate = new StringBuilder().append(onMergeIntoDoString(writeMode, columnHolders, valueHolders)).append("INSERT (")
                .append(StringUtils.join(columnHolders, ","))
                .append(") VALUES(").append(StringUtils.join(valueHolders, ","))
                .append(")").toString();
        }
        else {
    
    
            throw DataXException.asDataXException(DBUtilErrorCode.ILLEGAL_VALUE,
                                                  String.format("当前数据库不支持 writeMode:%s 模式.", writeMode));
        }
    }
    else {
    
    
        //这里是保护,如果其他错误的使用了update,需要更换为replace
        if (writeMode.trim().toLowerCase().startsWith("update")) {
    
    
            writeMode = "replace";
        }
        writeDataSqlTemplate = new StringBuilder().append(writeMode)
            .append(" INTO %s (").append(StringUtils.join(columnHolders, ","))
            .append(") VALUES(").append(StringUtils.join(valueHolders, ","))
            .append(")").toString();
    }

    return writeDataSqlTemplate;
}

The calling method is added as follows, mainly to splice the merge statement:

public static String onMergeIntoDoString(String merge, List<String> columnHolders, List<String> valueHolders) {
    
    
    String[] sArray = getStrings(merge);
    StringBuilder sb = new StringBuilder();
    sb.append("MERGE INTO %s A USING ( SELECT ");

    boolean first = true;
    boolean first1 = true;
    StringBuilder str = new StringBuilder();
    StringBuilder update = new StringBuilder();
    for (String columnHolder : columnHolders) {
    
    
        if (Arrays.asList(sArray).contains(columnHolder)) {
    
    
            if (!first) {
    
    
                sb.append(",");
                str.append(" AND ");
            } else {
    
    
                first = false;
            }
            str.append("TMP.").append(columnHolder);
            sb.append("?");
            str.append(" = ");
            sb.append(" AS ");
            str.append("A.").append(columnHolder);
            sb.append(columnHolder);
        }
    }

    for (String columnHolder : columnHolders) {
    
    
        if (!Arrays.asList(sArray).contains(columnHolder)) {
    
    
            if (!first1) {
    
    
                update.append(",");
            } else {
    
    
                first1 = false;
            }
            update.append(columnHolder);
            update.append(" = ");
            update.append("?");
        }
    }

    sb.append(" FROM DUAL ) TMP ON (");
    sb.append(str);
    sb.append(" ) WHEN MATCHED THEN UPDATE SET ");
    sb.append(update);
    sb.append(" WHEN NOT MATCHED THEN ");
    return sb.toString();
}

public static String[] getStrings(String merge) {
    
    
    merge = merge.replace("update", "");
    merge = merge.replace("(", "");
    merge = merge.replace(")", "");
    merge = merge.replace(" ", "");
    return merge.split(",");
}

2.3 CommonRdbmsWriter.Task modification

Modify the startWriteWithConnection() method

/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param recordReceiver
* @param taskPluginCollector
* @param connection
*/
public void startWriteWithConnection(RecordReceiver recordReceiver, TaskPluginCollector taskPluginCollector, Connection connection)
{
    
    
    this.taskPluginCollector = taskPluginCollector;
    List<String> mergeColumns = new ArrayList<>();

    if (this.dataBaseType == DataBaseType.Oracle && !"insert".equalsIgnoreCase(this.writeMode)) {
    
    
        LOG.info("write oracle using {} mode", this.writeMode);
        List<String> columnsOne = new ArrayList<>();
        List<String> columnsTwo = new ArrayList<>();
        String merge = this.writeMode;
        String[] sArray = WriterUtil.getStrings(merge);
        for (String s : this.columns) {
    
    
            if (Arrays.asList(sArray).contains(s)) {
    
    
                columnsOne.add(s);
            }
        }
        for (String s : this.columns) {
    
    
            if (!Arrays.asList(sArray).contains(s)) {
    
    
                columnsTwo.add(s);
            }
        }
        int i = 0;
        for (String column : columnsOne) {
    
    
            mergeColumns.add(i++, column);
        }
        for (String column : columnsTwo) {
    
    
            mergeColumns.add(i++, column);
        }
    }
    mergeColumns.addAll(this.columns);

    // 用于写入数据的时候的类型根据目的表字段类型转换
    this.resultSetMetaData = DBUtil.getColumnMetaData(connection,
                                                      this.table, StringUtils.join(mergeColumns, ","));
    // 写数据库的SQL语句
    calcWriteRecordSql();

    List<Record> writeBuffer = new ArrayList<>(this.batchSize);
    int bufferBytes = 0;
    try {
    
    
        Record record;
        while ((record = recordReceiver.getFromReader()) != null) {
    
    
            if (record.getColumnNumber() != this.columnNumber) {
    
    
                // 源头读取字段列数与目的表字段写入列数不相等,直接报错
                throw DataXException
                    .asDataXException(
                    DBUtilErrorCode.CONF_ERROR,
                    String.format(
                        "列配置信息有错误. 因为您配置的任务中,源头读取字段数:%s 与 目的表要写入的字段数:%s 不相等. 请检查您的配置并作出修改.",
                        record.getColumnNumber(),
                        this.columnNumber));
            }

            writeBuffer.add(record);
            bufferBytes += record.getMemorySize();

            if (writeBuffer.size() >= batchSize || bufferBytes >= batchByteSize) {
    
    
                doBatchInsert(connection, writeBuffer);
                writeBuffer.clear();
                bufferBytes = 0;
            }
        }
        if (!writeBuffer.isEmpty()) {
    
    
            doBatchInsert(connection, writeBuffer);
            writeBuffer.clear();
        }
    }
    catch (Exception e) {
    
    
        throw DataXException.asDataXException(
            DBUtilErrorCode.WRITE_DATA_ERROR, e);
    }
    finally {
    
    
        writeBuffer.clear();
        DBUtil.closeDBResources(null, null, connection);
    }
}

Modify the doBatchInsert() method

/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param connection
* @param buffer
* @throws SQLException
*/
protected void doBatchInsert(Connection connection, List<Record> buffer)
    throws SQLException
{
    
    
    PreparedStatement preparedStatement = null;
    try {
    
    
        connection.setAutoCommit(false);
        preparedStatement = connection
            .prepareStatement(this.writeRecordSql);
        if (this.dataBaseType == DataBaseType.Oracle && !"insert".equalsIgnoreCase(this.writeMode)) {
    
    
            String merge = this.writeMode;
            String[] sArray = WriterUtil.getStrings(merge);
            for (Record record : buffer) {
    
    
                List<Column> recordOne = new ArrayList<>();
                for (int j = 0; j < this.columns.size(); j++) {
    
    
                    if (Arrays.asList(sArray).contains(this.columns.get(j))) {
    
    
                        recordOne.add(record.getColumn(j));
                    }
                }
                for (int j = 0; j < this.columns.size(); j++) {
    
    
                    if (!Arrays.asList(sArray).contains(this.columns.get(j))) {
    
    
                        recordOne.add(record.getColumn(j));
                    }
                }
                for (int j = 0; j < this.columns.size(); j++) {
    
    
                    recordOne.add(record.getColumn(j));
                }
                for (int j = 0; j < recordOne.size(); j++) {
    
    
                    record.setColumn(j, recordOne.get(j));
                }
                preparedStatement = fillPreparedStatement(
                    preparedStatement, record);
                preparedStatement.addBatch();
            }
        }
        else {
    
    
            for (Record record : buffer) {
    
    
                preparedStatement = fillPreparedStatement(
                    preparedStatement, record);
                preparedStatement.addBatch();
            }
        }
        preparedStatement.executeBatch();
        connection.commit();
    }
    catch (SQLException e) {
    
    
        LOG.warn("回滚此次写入, 采用每次写入一行方式提交. 因为: {}", e.getMessage());
        connection.rollback();
        doOneInsert(connection, buffer);
    }
    catch (Exception e) {
    
    
        throw DataXException.asDataXException(
            DBUtilErrorCode.WRITE_DATA_ERROR, e);
    }
    finally {
    
    
        DBUtil.closeDBResources(preparedStatement, null);
    }
}

Modify the fillPreparedStatement() method

/**
* 更改适配oracle update
* @author 程序员大佬超
* @date 20221202
* @param preparedStatement
* @param record
* @return
* @throws SQLException
*/
protected PreparedStatement fillPreparedStatement(PreparedStatement preparedStatement, Record record)
    throws SQLException
{
    
    
    for (int i = 0; i < record.getColumnNumber(); i++) {
    
    
        int columnSqltype = this.resultSetMetaData.getMiddle().get(i);
        preparedStatement = fillPreparedStatementColumnType(preparedStatement, i,
                                                            columnSqltype, record.getColumn(i));
    }
    return preparedStatement;
}

2.4 Testing

After repackaging, test the job

{
    
    
  "job": {
    
    
    "setting": {
    
    
      "speed": {
    
    
        "channel": 3,
        "byte": 1048576
      },
      "errorLimit": {
    
    
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
    
    
        "reader": {
    
    
          "name": "mysqlreader",
          "parameter": {
    
    
            "username": "root",
            "password": "123456",
            "column": [
              "`id`",
              "`user_id`",
              "`user_password`",
              "`user_name`",
              "`user_phone`",
              "`email`",
              "`nick_name`",
              "`head_url`",
              "`sex`",
              "`state`",
              "`create_time`",
              "`create_user`",
              "`lastUpdateTime`"
            ],
            "splitPk": "",
            "connection": [
              {
    
    
                "table": [
                  "users"
                ],
                "jdbcUrl": [
                  "jdbc:mysql://127.0.0.1:3306/im"
                ]
              }
            ]
          }
        },
        "writer": {
    
    
          "name": "oraclewriter",
          "parameter": {
    
    
            "username": "yxc",
            "password": "123456",
            "column": [
              "\"ID\"",
              "\"USER_ID\"",
              "\"USER_PASSWORD\"",
              "\"USER_NAME\"",
              "\"USER_PHONE\"",
              "\"EMAIL\"",
              "\"NICK_NAME\"",
              "\"HEAD_URL\"",
              "\"SEX\"",
              "\"STATE\"",
              "\"CREATE_TIME\"",
              "\"CREATE_USER\"",
              "\"LASTUPDATETIME\""
            ],
            "writeMode": "update(\"ID\",\"USER_ID\")",
            "connection": [
              {
    
    
                "table": [
                  "USERS"
                ],
                "jdbcUrl": "jdbc:oracle:thin:@//192.168.157.142:1521/orcl"
              }
            ]
          }
        }
      }
    ]
  }
}

Note: The fields in the update brackets of writeMode should be added\"

Looking at the running log, you can see that the MERGE statement has been spliced ​​correctly:
insert image description here

insert image description here

Task executed successfully. Switch between insert and update mode inspection, no abnormality.



For more technical dry goods, please continue to pay attention to Programmer Dachao.
Originality is not easy, please indicate the source for reprinting.

Guess you like

Origin blog.csdn.net/xch_yang/article/details/128250190