Data synchronization component otter environment construction

1. Introduction of otter

    Part of the description refers to the Aliotter project's wiki. Ali otter tool address: https://github.com/alibaba/otter/wiki

    Otter is an incremental data synchronization tool from Ali. Based on the analysis of incremental log of the database, it can be synchronized to the mysql/oracle database in the local computer room or remote computer room in quasi-real time. It is a distributed database synchronization system.

    The company recently needed to synchronize offline data to the cloud warehouse, and chose the tool otter for incremental data synchronization of mysql, so it took a few weeks to go to the pit. Of course, otter can also do full data synchronization, but it is too troublesome. You can consider other methods to do full data first, and then do incremental.

                                            How otter works

    

    Others will not be introduced. Ali wiki has a more detailed introduction. Here are the main construction steps and the pits that have been laid.

2. Environment Construction

1、mysql

    The source library mysql needs to open binlog, because otter is based on canal, and canal is based on binlog, so, the first step requires MySQL to open binlog.

The method of opening binlog: Linux modifies the my.cnf file, Windows modifies the MySQL my.ini file. Specifically, Baidu

2. Zookeeper build

    zk can be a single-click version or a cluster can be built. What I built is pseudo-distributed, and there are not so many machines. See my other blog post for specific steps.

3. otter configuration

  a) Execute the SQL file: create a new library named otter for otter (other names can be used), and find otter-manager-schema.sql in the download package (address: https://github.com/alibaba/otter/releases ) , execute, and then there is data in otter. This library is mainly used to store information about configuring ottermanager and some data when otter is running.

    b) Modify otter's configuration file: conf/manager.properties

## otter manager domain name
otter.domainName = 127.0.0.1 # otter提供对外访问的IP
## otter manager http port
otter.port = 8080   # otter提供的web界面访问的端口
## jetty web config xml
otter.jetty = jetty.xml

## otter manager database config
otter.database.driver.class.name = com.mysql.jdbc.Driver
otter.database.driver.url = jdbc:mysql://127.0.0.1:3306/otter   #数据库连接
otter.database.driver.username = root  # 用户名
otter.database.driver.password = Geekplus@2017  密码

## otter communication port
otter.communication.manager.port = 1099   #管理节点端口

## otter communication pool size
otter.communication.pool.size = 10

## default zookeeper address
otter.zookeeper.cluster.default = 127.0.0.1:2181  # zk地址
## default zookeeper sesstion timeout = 60s
otter.zookeeper.sessionTimeout = 60000

## otter arbitrate connect manager config
otter.manager.address = ${otter.domainName}:${otter.communication.manager.port}

## should run in product mode , true/false
otter.manager.productionMode = true

## self-monitor enable or disable
otter.manager.monitor.self.enable = true
## self-montir interval , default 120s
otter.manager.monitor.self.interval = 120
## auto-recovery paused enable or disable
otter.manager.monitor.recovery.paused = true
# manager email user config 执行中报警信息发送地址的一些配置
otter.manager.monitor.email.host = xxx 
otter.manager.monitor.email.username = xxxx
otter.manager.monitor.email.password = xxx
otter.manager.monitor.email.stmp.port = 465

    After the configuration is completed, you can access it at htt://127.0.0.1:8080 after 10 seconds. If you want to do specific operations, you need to log in in the upper right corner.

Default username/secret: admin/admin

4. Node management

    The node management is configured in the use management, and the startup of the node should only be started after the manager is configured.

3. Use management

1. Machine management

First configure [zookeeper management] in the [Machine Management] option

Then add a zk cluster

Add node information in [node management]

After saving, the node node just configured will appear on the [node] management page

The serial number in the list is the serial number assigned by the manager to the node node, which is needed to configure the node node.

node node configuration:

    1" The configuration file is in /conf/otter.properties of the node package

# otter node root dir
otter.nodeHome = ${user.dir}/../

## otter node dir
otter.htdocs.dir = ${otter.nodeHome}/htdocs
otter.download.dir = ${otter.nodeHome}/download
otter.extend.dir= ${otter.nodeHome}/extend

## default zookeeper sesstion timeout = 60s
otter.zookeeper.sessionTimeout = 60000

## otter communication pool size
otter.communication.pool.size = 10

## otter arbitrate & node connect manager config
otter.manager.address = 127.0.0.1:1099  #node节点连接到manager的地址 IP地址为manner地址,
# 端口为在manager配置文件中的otter.communication.manager.port = 1099配置

    2" Add a file nid to the conf directory, and write the serial number just assigned by the manager in the file, such as 3.

    After completing steps 1 and 2, node can be started.

2. Configuration management

    In the configuration management, perform [Data Source Configuration] [Data Table Configuration] [Canal Configuration]

    [Data source configuration] and [Data table configuration] are relatively simple

    [canal configuration] One canal corresponds to one database instance

    

3. Synchronization management

    1) Add a channel

    

    illustrate:

    a. Synchronization Consistency

  1. Based on database back-check (in simple terms, it is to force back-check the database, get the pk from binlog, directly back-check the corresponding database records for synchronization, and go back to binlog a few days ago for consumption to avoid synchronizing the old version of the data. )
  2. Based on the current log change (synchronization based on the field change value parsed by binlog/redolog, no database back-check, recommended)

    2) Add a pipeline to the channel 

    

    

    Add monitoring to the pipeline

    

    illustrate:

  • Delay time = database synchronization to target database successful time - database source database change time, in seconds. (The corresponding node node pushes the configuration regularly)
  • The last synchronization time = the latest successful time of the database synchronization to the target database (the relevant tables of the current synchronization concern, the last successful time of synchronization to the target database)
  • Last site time = data binlog consumption time of the last update site (different from synchronization time: a database may have changes to other tables, which will not trigger synchronization time changes, but will trigger site time changes)

    

    3) Configure the mapping relationship

    

    Custom data processing will be discussed later

    3) Field mapping configuration

    

    

Fourth, custom data processing

   Extract module:

  • EventProcessor : Custom data processing, you can change any content of a change data
  • FileResolver : Resolve the relationship between data and files

    At present, both only support java language, but both support the function of dynamic compilation & lib package loading at runtime.

  1. Directly publish the source file code through Otter Manager, and then push it to the node node to take effect immediately, without restarting any java process, a bit like a dynamic language
  2. The class file can be placed in the extend directory or into a jar package, placed in the node startup classpath, or loaded by specifying the class name through the Otter Manager, which allows complete customization of the business. (But there is a disadvantage. If some external packages are used to add to the node classpath, such as remote interface calls, the current EventProcessor calls are serial processing, and the efficiency of remote call execution for serial will be relatively poor. )

        

1. Based on the principle of EventProcessor

    Currently I am using EventProcessor, source link: https://github.com/alibaba/otter.

    This module is located under the extract module of the source package, the full path: com.alibaba.otter.node.etl.extract. The following configuration can be seen in the project's configuration file /spring/otter-node-extract.xml:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:aop="http://www.springframework.org/schema/aop"
	xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd	   http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd	   http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd"
	default-autowire="byName" default-dependency-check="none">

	<bean id="otterExtractorFactory" class="com.alibaba.otter.node.etl.extract.extractor.OtterExtractorFactory" scope="singleton">
		<property name="dbBatchExtractor">
			<list>
				<value>freedomExtractor</value>
				<value>groupExtractor</value>
				<value>databaseExtractor</value>
				<value>processorExtractor</value>
				<value>fileExtractor</value>
				<value>viewExtractor</value>
			</list>
		</property>
	</bean> 
	
	<!-- 池化配置 -->
	<bean id="databaseExtractor" class="org.springframework.aop.framework.ProxyFactoryBean">
		<property name="optimize" value="false"/>
   		<property name="proxyTargetClass" value="true" />
		<property name="targetSource" ref="databaseExtractorTargetSource" />
	</bean>
	<bean id="databaseExtractorTargetSource" class="org.springframework.aop.target.CommonsPoolTargetSource" >
		<property name="minIdle" value="1" />
		<property name="maxSize" value="-1" />
		<property name="timeBetweenEvictionRunsMillis" value="60000" /><!-- 1分钟进行一次回收 -->
		<property name="minEvictableIdleTimeMillis" value="600000" /><!-- 10分钟回收空闲的 -->
		<property name="targetBeanName" value="databaseExtractorTarget" />
	</bean>
	<bean id="databaseExtractorTarget" class="com.alibaba.otter.node.etl.extract.extractor.DatabaseExtractor" scope="prototype" >
		<property name="poolSize" value="5" />
	</bean>
	
	<bean id="fileExtractor" class="com.alibaba.otter.node.etl.extract.extractor.FileExtractor" scope="singleton" >
	</bean>
	
	<bean id="freedomExtractor" class="com.alibaba.otter.node.etl.extract.extractor.FreedomExtractor" scope="singleton" >
	</bean>
	
	<bean id="viewExtractor" class="com.alibaba.otter.node.etl.extract.extractor.ViewExtractor" scope="singleton" >
	</bean>
	
	<bean id="groupExtractor" class="com.alibaba.otter.node.etl.extract.extractor.GroupExtractor" scope="singleton" >
	</bean>
	
	<bean id="processorExtractor" class="com.alibaba.otter.node.etl.extract.extractor.ProcessorExtractor" scope="singleton" >
	</bean>
</beans>

    Then go to the OtterExtractorFactory factory class, we can see that the factory class traverses the dbBatchExtractor, and then removes the execution of the otterExtractor.extract(dbBatch) method to process a batch of data

 public void extract(DbBatch dbBatch) {
        Assert.notNull(dbBatch);
        for (Object extractor : dbBatchExtractor) {
            OtterExtractor otterExtractor = null;
            if (extractor instanceof java.lang.String) {
                // 每次从容器中取一次,有做池化处理
                otterExtractor = (OtterExtractor) beanFactory.getBean((String) extractor, OtterExtractor.class);
            } else {
                otterExtractor = (OtterExtractor) extractor;
            }

            otterExtractor.extract(dbBatch);
        }
    }

    Go to the configuration in XML

<bean id="otterExtractorFactory" class="com.alibaba.otter.node.etl.extract.extractor.OtterExtractorFactory" scope="singleton">
		<property name="dbBatchExtractor">
			<list>
				<value>freedomExtractor</value>  <!--自由门提取器-->
				<value>groupExtractor</value>    <!--这个不知道是什么,感觉是数据表配置中的那个group-->
				<value>databaseExtractor</value>    <!--执行数据反查,包括了自由门数据和配置了基于表反查的数据需要走这里-->
				<value>processorExtractor</value>    <!--自定义提取器,这是我们自定义数据处理的入口-->
				<value>fileExtractor</value>    <!--文件提取器,这个我永不着-->
				<value>viewExtractor</value>    <!--这个不知道是干嘛的-->
			</list>
		</property>
	</bean> 

All Extractors are defined here, and our custom data processing is performed in the processorExtractor.

2. Implementation of data customization

    Steps: Add dependency package - "Inherited class AbstractEventProcessor - "Write business code - "Deploy (I am)

    List an implementation:

Import the two packages that add maven dependencies:

<!-- https://mvnrepository.com/artifact/com.alibaba.otter/shared.etl -->
		<dependency>
			<groupId>com.alibaba.otter</groupId>
			<artifactId>shared.etl</artifactId>
			<version>4.2.15</version>
		</dependency>
		<!-- https://mvnrepository.com/artifact/com.alibaba.otter/node.extend -->
		<dependency>
			<groupId>com.alibaba.otter</groupId>
			<artifactId>node.extend</artifactId>
			<version>4.2.15</version>
		</dependency>

accomplish:

package com.geekplus.db.yugong.foshan;

import java.util.List;
import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.lang.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.otter.node.extend.processor.AbstractEventProcessor;
import com.alibaba.otter.shared.etl.model.EventColumn;
import com.alibaba.otter.shared.etl.model.EventData;
import com.alibaba.otter.shared.etl.model.EventType;

public class PrimaryKeyTransfer extends AbstractEventProcessor {
    private final static Logger logger = LoggerFactory.getLogger(PrimaryKeyTransfer.class);

    /***
     * 描述:每次处理一行数据
     */
    public boolean process(EventData eventData) {
        boolean isHandle = true;
        if (eventData == null)
            return isHandle;
        logger.info("eventdata ,tableName:{},tableID:{},db:{},ddlTable:{}", eventData.getTableId(),
                eventData.getTableName(), eventData.getSchemaName(), eventData.getDdlSchemaName());
        if (StringUtils.equals("out_order_back", eventData.getTableName())
                || StringUtils.equals("out_order_back", eventData.getTableName())) {
            return parseOrder(eventData);
        } else if (StringUtils.equals("out_order_pkg", eventData.getTableName())
                || StringUtils.equals("out_order_pkg", eventData.getTableName())) {
            return parsePkg(eventData);
        } else if (StringUtils.equals("t_base_warehouse", eventData.getTableName())
                || StringUtils.equals("t_base_warehouse", eventData.getTableName())) {
            return parseWarehouse(eventData);
        } else if (StringUtils.equals("t_base_customer", eventData.getTableName())
                || StringUtils.equals("t_base_owner", eventData.getTableName())) {
            return parseCustomer(eventData);
        } else if (StringUtils.equals("t_base_carrier_info", eventData.getTableName())
                || StringUtils.equals("t_base_carrier_info", eventData.getTableName())) {
            return parseCarrier(eventData);
        }
        return isHandle;
    }

    private EventColumn findColumn(List<EventColumn> columns, String columnName) {
        for (EventColumn column : columns) {
            if (StringUtils.equals(columnName, column.getColumnName())) {
                return column;
            }
        }
        return null;
    }

    private boolean parseOrder(EventData eventData) {
        EventColumn normalWarehouseCl = findColumn(eventData.getColumns(), "warehouse_code");
        if (normalWarehouseCl == null)
            return false;
        eventData.getKeys().add(normalWarehouseCl);
        eventData.getColumns().remove(normalWarehouseCl);
        //处理Insert
        if (EventType.INSERT.equals(eventData.getEventType())) {

        }
        //处理update
        if (EventType.UPDATE.equals(eventData.getEventType())) {
            // 如果存在主键的变更
            if (CollectionUtils.isNotEmpty(eventData.getOldKeys())) {
                eventData.getOldKeys().add(normalWarehouseCl.clone());
            }
        }
        //处理delete
        if (EventType.DELETE.equals(eventData.getEventType())) {

        }
        return true;
    }

    private boolean parsePkg(EventData eventData) {
        if (EventType.INSERT.equals(eventData.getEventType())) {
            return true;
        }
        if (EventType.UPDATE.equals(eventData.getEventType())) {
            EventColumn normalOrderCodeCl = findColumn(eventData.getKeys(), "out_order_code");
            EventColumn normalWarehouseCl = findColumn(eventData.getKeys(), "warehouse_code");
            if (normalOrderCodeCl == null || normalWarehouseCl == null)
                return false;
            eventData.getColumns().clear();
            moveDate(eventData.getKeys(), eventData.getColumns());
            eventData.getKeys().clear();
            eventData.getOldKeys().clear();
            // keys 和 oldkeys必须有顺序
            eventData.getKeys().add(normalOrderCodeCl);
            eventData.getKeys().add(normalWarehouseCl);
            eventData.getOldKeys().add(normalOrderCodeCl);
            eventData.getOldKeys().add(normalWarehouseCl);
            return true;
        }
        if (EventType.DELETE.equals(eventData.getEventType())) {
            return true;
        }
        return false;
    }

    private void moveDate(List<EventColumn> source, List<EventColumn> dest) {
        for (EventColumn srcC : source) {
            boolean exist = false;
            for (EventColumn destC : dest) {
                if (StringUtils.equals(destC.getColumnName(), srcC.getColumnName())) {
                    exist = true;
                    break;
                }
            }
            if (!exist) {
                dest.add(srcC);
            }
        }
    }

    private boolean parseWarehouse(EventData eventData) {
        EventColumn normalWarehouseCl = findColumn(eventData.getColumns(), "warehouse_code");
        if (normalWarehouseCl == null)
            return false;
        eventData.getColumns().remove(normalWarehouseCl);
        eventData.getKeys().add(normalWarehouseCl);
        if (EventType.INSERT.equals(eventData.getEventType())) {

        }
        if (EventType.UPDATE.equals(eventData.getEventType())) {
            if (CollectionUtils.isNotEmpty(eventData.getOldKeys())) {
                eventData.getOldKeys().add(normalWarehouseCl.clone());
            }
        }
        if (EventType.DELETE.equals(eventData.getEventType())) {

        }
        return true;
    }

    private boolean parseCustomer(EventData eventData) {
        if (EventType.INSERT.equals(eventData.getEventType())) {

        }
        if (EventType.UPDATE.equals(eventData.getEventType())) {
            EventColumn pk = findColumn(eventData.getKeys(), "pk_t_base_customer_code");
            eventData.getOldKeys().add(pk);
        }
        if (EventType.DELETE.equals(eventData.getEventType())) {

        }
        return true;
    }

    private boolean parseCarrier(EventData eventData) {
        EventColumn carrierCodeCl = findColumn(eventData.getColumns(), "carrier_code");
        if (carrierCodeCl == null)
            return false;
        eventData.getColumns().remove(carrierCodeCl);
        eventData.getKeys().add(carrierCodeCl);
        if (EventType.INSERT.equals(eventData.getEventType())) {

        }
        if (EventType.UPDATE.equals(eventData.getEventType())) {
            if (CollectionUtils.isNotEmpty(eventData.getOldKeys())) {
                eventData.getOldKeys().add(carrierCodeCl.clone());
            }
        }
        if (EventType.DELETE.equals(eventData.getEventType())) {

        }
        return true;
    }
}

After that, put the prepared jar package in the lib directory of each node, and then restart the node and the changes will take effect.

 

focus class

5. Custom data synchronization (Freegate)

    The main function is to trigger data synchronization in the data table without modifying the original table data. 

can be use on: 

  • Synchronized data correction
  • Full data synchronization. (Freegate triggers full data, and otter incremental synchronization needs to be configured as row record mode to avoid losing the update operation due to the lack of records in the target library during update)

1. Principle

Main principle:

a. Based on the otter system table retl_buffer, insert specific data, including the table name and pk information to be synchronized.

b. After the otter system perceives, it will extract the corresponding data (the entire row of records) according to the table name and pk, and synchronize to the target library together with the normal incremental synchronization.

At present, the freegate data method sensed by the otter system is:

  • Log records. (For each change of table data, binlog needs to be enabled, otter obtains binlog data, extracts the synchronized table name, pk information, and then returns the table to query the entire row of records)

It should be noted that otter's freegate is intrusive to the source library. It is necessary to add a retl library and a retl user whose password is retl. The executed SQL is:

/*
供 otter 使用, otter 需要对 retl.* 的读写权限,以及对业务表的读写权限
1. 创建database retl
*/
CREATE DATABASE retl;

/* 2. 用户授权 给同步用户授权 */
CREATE USER retl@'%' IDENTIFIED BY 'retl';
GRANT USAGE ON *.* TO `retl`@'%';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO `retl`@'%';
GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE ON `retl`.* TO `retl`@'%';
/* 业务表授权,这里可以限定只授权同步业务的表 */
GRANT SELECT, INSERT, UPDATE, DELETE ON *.* TO `retl`@'%';  

/* 3. 创建系统表 */
USE retl;
DROP TABLE IF EXISTS retl.retl_buffer;
DROP TABLE IF EXISTS retl.retl_mark;
DROP TABLE IF EXISTS retl.xdual;

CREATE TABLE retl_buffer
(	
	ID BIGINT(20) AUTO_INCREMENT,
	TABLE_ID INT(11) NOT NULL,
	FULL_NAME varchar(512),
	TYPE CHAR(1) NOT NULL,
	PK_DATA VARCHAR(256) NOT NULL,
	GMT_CREATE TIMESTAMP NOT NULL,
	GMT_MODIFIED TIMESTAMP NOT NULL,
	CONSTRAINT RETL_BUFFER_ID PRIMARY KEY (ID) 
)  ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE retl_mark
(	
	ID BIGINT AUTO_INCREMENT,
	CHANNEL_ID INT(11),
	CHANNEL_INFO varchar(128),
	CONSTRAINT RETL_MARK_ID PRIMARY KEY (ID) 
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

CREATE TABLE xdual (
  ID BIGINT(20) NOT NULL AUTO_INCREMENT,
  X timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (ID)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

/* 4. 插入初始化数据 */
INSERT INTO retl.xdual(id, x) VALUES (1,now()) ON DUPLICATE KEY UPDATE x = now();

retl_buffer table structure:

CREATE TABLE retl_buffer
   (    
    ID BIGINT AUTO_INCREMENT, ## meaningless, just increment
    TABLE_ID INT(11) NOT NULL, ## tableId, can be queried through this link: http://otter.alibaba-inc.com/data_media_list.htm, that is, the serial number column. If the configuration is regular, you need to specify full_name. table_id is set to 0.
    FULL_NAME varchar(512), ## schemaName + '.' + tableName (if table_id is specified explicitly, full_name can be omitted)
    TYPE CHAR(1) NOT NULL, ## I/U/D , corresponding to insert/update/delete respectively
    PK_DATA VARCHAR(256) NOT NULL, ## Use char(1) to separate multiple pks
    GMT_CREATE TIMESTAMP NOT NULL, ## meaningless, just the system time
    GMT_MODIFIED TIMESTAMP NOT NULL, ## meaningless, just the system time
    CONSTRAINT RETL_BUFFER_ID PRIMARY KEY (ID)
   )  ENGINE=InnoDB DEFAULT CHARSET=utf8;

2. Operation

Example of full synchronization operation:

insert into retl.retl_buffer(ID,TABLE_ID, FULL_NAME,TYPE,PK_DATA,GMT_CREATE,GMT_MODIFIED) (select null,0,'$schema.table$','I',id,now(),now() from $schema.table$);

If there are multiple primary keys, the corresponding PK_DATA needs to be spliced ​​according to (char) 1 for several primary keys of the synchronization table, such as 

insert into `retl`.`retl_buffer` ( `TABLE_ID`, `FULL_NAME`, `TYPE`, `PK_DATA`, `GMT_CREATE`, `GMT_MODIFIED`) values ( '0', 'test.t_base_warehouse', 'I', '20', '2018-01-19 15:00:57', '2018-01-19 10:34:00');

Here's an actual example:

insert into `retl`.`retl_buffer` ( `TABLE_ID`, `FULL_NAME`, `TYPE`, `PK_DATA`, `GMT_CREATE`, `GMT_MODIFIED`) values ( '0', 'test.t_base_warehouse', 'I', '20', '2018-01-19 15:00:57', '2018-01-19 10:34:00');

TABLE_ID FULL_NAME TYPE PK_DATA GMT_CREATE GMT_MODIFIED
Self-increment does not need to be filled Source library name.Table name I/U/D , corresponding to insert/update/delete respectively The value of the primary key, note that the value is not the name of the primary key. If you need to synchronize a few pieces of data, you need to add a few records Just use the columns Just use the columns

At the end, I will take time to interpret the next L stage, how otter assembles SQL in the target library.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324392081&siteId=291194637