Oracle data synchronization project yugong

  • Overview
I tried the oracle data synchronization scheme based on materialized view + java source before. In order to transmit the change information of materialized view to java source and send it to external programs, triggers and stored procedures/functions need to be used. Triggers are used to monitor the data changes of the materialized view, and call the stored procedure to indirectly call the java source (the stored procedure can point to a java source). The disadvantages of this solution are as follows:
1. Establishing a materialized view for each synchronization table will consume storage resources
2. Part of the java source code may need to rely on third-party packages, and a large number of external jar packages need to be loaded on the database server
3. Each materialized view It is necessary to establish triggers to monitor data changes (after watching yugong, I wondered whether to establish triggers directly on the materialized view log, but it would be relatively cumbersome. It is necessary to get data from the main table according to the log table records, and then pass it to java. source, and finally delete the log)

  • yugong
Recently, I saw Yugong, Alibaba's oracle-based database migration project, which is also implemented based on materialized views. The differences are:
1. Although they are all based on materialized views, the yugong project only uses materialized view logs, and the parameters PRIMARY KEY and SEQUENCE are used when creating the materialized view logs, so that the log will contain the primary key column and operation sequence number. The extractor will extract the change data in the order of SEQUENCE$$, and delete the extracted change data from the log table. The original solution uses the fast refresh of the materialized view. After the commit, the materialized view is automatically refreshed, and the log is cleared
. 2. The data extraction in yugong uses jdbc, that is, the extractor part (extracts data from the source library). According to the data in the log table The primary key column obtains data from the source table; the original scheme uses triggers to obtain data changes.
3. The applier part of yugong (update to the target library) also uses jdbc to directly update the converted data into the target library; while the original scheme uses java source to send the changed data to an external program for processing.
4. Yugong introduced Translator for heterogeneous data conversion; while the original solution uses an external consumer to process

yugong for a detailed introduction, you can refer to:
https://github.com/alibaba/yugong/wiki/AdminGuide
http://blog. csdn.net/sunnylinner/article/details/52064637 For a
detailed introduction of materialized views, you can refer to:
http://www.cnblogs.com/linjiqin/archive/2012/05/23/2514795.html

The following mainly introduces the encounters encountered when using The problem:
1. The SEQUENCE$$ identifier is invalid, because the SEQUENCE$$ is used to sort incremental records to obtain sequential operation records. The materialized view log was created before the table used for testing, but SEQUENCE
2 and yugong.extractor.noupdate.thresold were not used when it was created. It is necessary to pay attention to the setting of this value. If it is less than or equal to 0, it will always be in an incremental state. If it is greater than 0, it is in a catch-up state. If the number of execution increments exceeds this value, the increment will be terminated and resources will be released to the next table. Need to be used in conjunction with yugong.table.concurrent.size. For example: threshold=0, concurrent.size=1, two tables are synchronized, only one table is in the synchronization state, and the other table is in the waiting state. Because there is only one processing thread, and the thread=0 thread has not been released. Similarly, thresold=3, concurrent.size=5, still synchronize two tables, the synchronization thread will not be released due to the large number of concurrent threads. Therefore, if you need to continuously synchronize a large number of tables, you need to set threshold=0 and concurrent.size=n, where n is greater than or equal to the number of synchronized tables. n may be very large, and you can also find a way to make the synchronized table join the synchronization queue again.
3. When a large number of tables are continuously synchronized, the parallel mode is enabled, and multi-threading is used for extraction and entry into the target database, so they should be deployed independently. But the premise is that both databases can be directly connected
. 4. Each table of yugong corresponds to an instance, which is responsible for table migration, including extractor, translator, and applier. The extractor cannot be separated from the translator and the applier, and can be deployed independently. It is not applicable to the situation where the target library cannot be directly connected.

The problem that the Clob field type cannot be properly synchronized : The Clob type field is converted to a String type value during data extraction, but ColumnMeta.type has not changed, and the value does not match the type. When data is updated or inserted into the target library, ps.setObject(index,cv.getValue(),cv.getColumn().getType()), String type cannot be converted to Clob type, and SQL execution fails. Solution: reset the field type to be consistent with the value, col.setType(Types.VARCHAR);
public abstract class AbstractOracleRecordExtractor extends AbstractRecordExtractor {
	
	protected ColumnValue getColumnValue(ResultSet rs, String encoding, ColumnMeta col) throws SQLException {
		if(){
			...
		}else if (YuGongUtils.isClobType(col.getType())) {
			value = rs.getString(col.getName());
			col.setType(Types.VARCHAR);
		}
		...
	}

}

The Blob field type also cannot be synchronized normally : the same problem as Clob, it is converted to byte[] when acquiring data, and the value and type are inconsistent when ps.setObject.

Note: When first considering the value, do not convert, directly take the Blob, so that the value and type are consistent, but the same fails: the table or view does not exist. The reason is that Blob is implemented using LOCATOR (locator), which points to the SQL BLOB in the database. The BLOB in the A library cannot be directly applied to the BLOB in the B library as a value. BLOB can refer to: http://blog.csdn.net/terryzero/article/details/3939014

Solution: call ps.setObject(index, cv.getValue(), cv.getColumn().getType()) in Applier When the data type is judged, when the type is Types.BLOB, execute ps.setBinaryStream(index, new ByteArrayInputStream((byte[])cv.getValue()));
if(cv.getColumn().getType()==Types.BLOB){
	ps.setBinaryStream(index, new ByteArrayInputStream((byte[])cv.getValue()));
}else{
	ps.setObject(index, cv.getValue(), cv.getColumn().getType());
}

The same CLOB can also use the same solution: the value is still converted to String type, and the placeholder assignment is processed
if(cv.getColumn().getType()==Types.CLOB){
	ps.setCharacterStream(index, new StringReader((String)cv.getValue()));
}else{
	ps.setObject(index, cv.getValue(), cv.getColumn().getType());
}

java.lang.AbstractMethodError:oracle.jdbc.driver.T4CPreparedStatement.setBlob(ILjava/io/InputStream : Oracle driver version problem, the database driver can be changed to ojdbc6.jar

Global schema problem :
1. Source library: when no table is specified, By default, all tables under the current connection schema are taken; when specifying a table, if you do not specify a schema, it is possible to get multiple schemas (a table exists in multiple schemas). In this case, you need to specify a schema. See the source code: TableMetaGenerator.getTableMetasWithoutColumn
2. Target library: The schema of the source library is used by default. When the source library and the target library are inconsistent, a global schema converter needs to be added. The implementation is as follows:
  • Add configuration items: yugong.applier.table.schema
  • Add a global schema conversion class: SchemaDataTranslator, the code is as follows:
public class SchemaDataTranslator extends AbstractDataTranslator implements DataTranslator {
	
	private String tableSchema;
	
	public SchemaDataTranslator(String tableSchema){
		this.tableSchema=tableSchema;
	}
	
	public String translatorSchema() {
        return tableSchema;
    }

	 public List<Record> translator(List<Record> records) {
		 for (Record record : records) {
			 String schema = translatorSchema();
            if (schema != null) {
                record.setSchemaName(schema);
            }
        }
        return records;
	}
}
  • Introduce Translator when initializing instance: yugong.applier.table.schema, modify YuGongController.buildTranslator code as follows:
private DataTranslator buildTranslator(String name) throws Exception {
        String tableName = YuGongUtils.toPascalCase(name);
        String translatorName = tableName + "DataTranslator";
        String packageName = DataTranslator.class.getPackage().getName();
        Class clazz = null;
        try {
            clazz = Class.forName(packageName + "." + translatorName);
        } catch (ClassNotFoundException e) {
            File file = new File(translatorDir, translatorName + ".java");
            if (!file.exists()) {
                // Compatible with table names
                file = new File(translatorDir, tableName + ".java");
                if (!file.exists()) {
                	String targetSchema=config.getString("yugong.applier.table.schema", null);
                	if(StringUtils.isNotBlank(targetSchema)){
                		return new SchemaDataTranslator(targetSchema);
                	}
                    return null;
                }
            }

            String javaSource = StringUtils.join(IOUtils.readLines(new FileInputStream(file)), "\n");
            clazz = compiler.compile(javaSource);
        }

        return (DataTranslator) clazz.newInstance();
    }

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326356655&siteId=291194637