Cannal realizes data heterogeneity

Problem:
In a large-scale website architecture, DB will use sub-database and sub-table to solve the problem of capacity and performance. But this brings a new problem: for example, queries of different dimensions or aggregation query
schemes: the
problem is generally solved through the data heterogeneity mechanism.

Specific example:
In order to improve the system's ability to receive orders, the order table needs to be sub-database and sub-table, and the following question: how can users query their own order list?
Method 1: Scan all order tables, and then aggregate in memory, which is definitely not possible in a high-traffic architecture;
Method 2: Double-write, but double-write cannot guarantee consistency;
Method 3: Subscribe to the database change log, such as subscribing to mysql's binlog The log simulates the master-slave synchronization mechanism of the database, and then parses the change log and writes it to the order list, thereby realizing data heterogeneity.
This mechanism can also ensure data consistency.
For example, the order center is divided into databases and tables according to the order number, and then heterogeneously generated: the order list is divided into databases and tables according to user orders, merchant orders, order cache, ES search

MYSQL master-slave replication
principle
-- canal is similar to this principle



1. Preparation:
github : https://github.com/alibaba/canal
includes
1-canal's documentation 2
-server side
3-client side
4-example
5-source package and so

on
The deployment of the computer room is proposed because of the business requirement of synchronization across the computer room.

In the early days, Alibaba's B2B company had business requirements for synchronization across computer rooms due to the deployment of dual computer rooms in Hangzhou and the United States. However, the early database synchronization business was mainly based on the trigger method to obtain incremental changes. However, since 2010, Alibaba-based companies began to gradually try to analyze the logs based on the database to obtain incremental changes for synchronization. The business of mass subscription & consumption has since opened a new era.
Note: The synchronization currently used internally already supports the log parsing of some versions of mysql5.x and Oracle. The

canal server subscribes to the binlong log of the database through the slave mechanism

. The services supported by log incremental subscription & consumption:
(1) Database mirroring
(2) Database Real-time backup
(3) Multi-level index (separate database indexes for sellers and buyers)
(4) search build
(5) business cache refresh
(6) key business news such as price changes

: database synchronization, incremental subscription & consumption.


3. The working principle of canal:
From the upper layer, replication is divided into three steps:
(1) The master records the changes in the binary log (these records are called binary log events, binary log events, which can be viewed by show binlog events );
(2) the slave copies the master's binary log events to its relay log;
(3) the slave redoes the events in the relay log, changing the data that reflects its own.


4. Deploy canal:
4.0 Prerequisites:
Cannot find a Java JDK. Please set either set JAVA or put java (>=1.5) in your PATH.
JDK


4.1 needs to be installed to deploy canal-server:
4.1.1 Database configuration:
enable MySQL's binlog function, and configure the binlog mode to row .

Add the following to my.cnf:
vi /etc/my.cnf
[mysqld] 
log-bin=mysql-bin #Open binary log 
binlog-format=ROW #Select row mode, do not use statement or mix mode
server_id=1 #Configure the main The database ID cannot be duplicated with the slave database, that is, it cannot be duplicated with the slaveId of canal. To configure mysql replaction, you need to define
the three logging modes provided by binlog:
see the book. It is recommended to use row mode when using Canal.

In addition, execute "show binary logs" in MYSQL. ", you will see which binary files are currently available and their sizes

4.1.2 Configure the canal database management user in mysql, configure the corresponding permissions (repication permissions)
CREATE USER canal IDENTIFIED BY 'canal';   
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%'; 
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ; 
FLUSH PRIVILEGES;
Description: Be sure to restart, otherwise it will not take effect and avoid errors like this

4.1.3 Download canal https://github.com/alibaba /canal/releases
and extract it to the corresponding folder, for example, what I downloaded is canal.deployer-1.0.24.tar.gz
mkdir /usr/server/canal
cd /usr/server
tar -zxvf canal.tar.gz -C canal

canal file directory structure
drwxr-xr-x 2 jianghang jianghang 136 2013-02-05 21:51 bin 
drwxr-xr-x 4 jianghang jianghang 160 2013-02-05 21:51 conf 
drwxr-xr-x 2 jianghang jianghang 1.3K 2013-02-05 21:51 lib 
drwxr-xr-x 2 jianghang jianghang 48 2013-02-05 21:29 logs 

4.1.4 Modify the configuration-canal database instance instance.properties
Description: You can use the existing example here, You can also create a new instance. This name needs to be the same as the name written in the client java class.
Here we configure a new instance of canal Server

// vi /usr/server/canal/conf/example/instance.properties 
mkdir -p /usr/server/canal/conf/product
cp /usr/server/canal/conf/example/instance.properties /usr/server/ canal/conf/product/
vi /usr/server/canal/conf/product/instance.properties

############################### ###################### 
mysql serverId must be inconsistent with the master's SQL ID
canal.instance.mysql.slaveId = 101 
 
# position info: connected database address, Which binary file and where to start 
canal.instance.master.address = 127.0.0.1:3306
# When the MYSQL main library is connected, the starting binlog file
canal.instance.master.journal.name =
# When the MYSQL main library is connected, The starting binlog file offset  
canal.instance.master.position =
# When the MYSQL master library is connected, the starting binlog file timestamp
canal.instance.master.timestamp = #Slave  
 
library
#canal.instance.standby.address =  
#canal.instance.standby.journal.name = 
#canal.instance.standby.position =  
#canal.instance.standby.timestamp =  
 
# username/password, you need to change it to your own database information 
canal.instance.dbUsername = canal   
canal.instance.dbPassword = canal 
canal.instance.defaultDatabaseName = canal_test 
canal.instance.connectionCharset = UTF-8

# Filter which tables in which library are subscribed through the following configuration, reducing unnecessary subscriptions ;
# For example, if you only focus on the product database, you can only subscribe to the product database through the following pattern
# table regex 
# canal.instance.filter.regex = .*\\..* 
  canal.instance.filter.regex = product_\d+\\.*
 
################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################# 

Notes : If there are multiple database subscriptions, you need to configure multiple instances and configure a configuration file for each database


4.1.5 Configure canal Server, modify conf/canal.properties
vi /usr/server/canal/conf/canal.properties

canal.id= 1
canal.ip=
canal.port= 11111
canal.zkServers=


# Currently on canalserver Deployed instances are separated by commas when configuring multiple ones. Configure product here
canal.destinations= product

# Use zk persistence mode, which can ensure cluster data sharing and share HA
canal.instance.global.spring.xml = classpath:spring /default-instance.xml

4.1.5 then cd to bin directory start and stop canal-server
start 
/usr/server/canal/bin/startup.sh & tail -f /usr/server/canal/logs/canal/canal. log

stop
/usr/server/canal/bin/stop.sh to 

verify the startup status, check the log file
tail -f /usr/server/canal/logs/canal/canal.log

2014-07-18 10:21:08.525 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## start the canal server. 
2014-07-18 10:21:08.609 [main] INFO com.alibaba .otter.canal.deployer.CanalController - ## start the canal server[10.12.109.201:11111] 
2014-07-18 10:21:09.037 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## the canal server is running now ...... 
---> The above log information shows that canal was started successfully


*********canal server cluster**************
canal One server can be deployed, or multiple servers can be deployed, but only one is active, and the others are used as standby machines; the high availability of canalserver is maintained through zk.

Need to install: the zookeeper

configuration file is modified as follows:
vi /usr/server/canal/conf/canal.properties

canal.id= 1
canal.ip=
canal.port= 11111
canal.zkServers=127.0.0.1:2181


4.2 Run canal-client Example:
4.2.1 Create an example maven project
mvn archetype:create -DgroupId=com.alibaba.otter -DartifactId=canal.sample 
[Practice: Manually create a Maven project]

4.2.2 Add pom dependencies:
<dependency> 
    <groupId>com.alibaba. otter</groupId> 
    <artifactId>canal.client</artifactId> 
    <version>1.0.12</version> 
</dependency> 

4.2.3 Update dependencies mvn install

4.2.4 ClientSamplet.Java 
example code
package canal.sample;
/ **
 * Created by hp on 14-7-17.
 */ 
import java.net.InetSocketAddress; 
import java.util.List;
 
import com.alibaba.otter.canal.client.CanalConnector; 
import com.alibaba.otter.canal .common.utils.AddressUtils; 
import com.alibaba.otter.canal.protocol.Message; 
import com.alibaba.otter.canal.protocol.CanalEntry.Column; 
import com.alibaba.otter.canal.protocol.CanalEntry.Entry; 
import com.alibaba.otter.canal.protocol.CanalEntry.EntryType; 
import com.alibaba.otter.canal.protocol.CanalEntry.EventType; 
import com.alibaba.otter.canal.protocol.CanalEntry.RowChange; 
import com.alibaba.otter.canal.protocol.CanalEntry.RowData; 
import com.alibaba.otter.canal.client.*; 
import com.google.protobuf.InvalidProtocolBufferException;
//import org.jetbrains.annotations.NotNull;  
 
public class ClientSample { 
 
    public static void main(String args[]) throws Exception { 
        // 连接 canal Server 
        //CanalConnector connector = CanalConnectors.newSingleConnector(new InetSocketAddress(AddressUtils.getHostIp(), 
        CanalConnector connector = CanalConnectors.newSingleConnector(new InetSocketAddress("192.168.1.121",
                11111), "example", "", ""); 
                // 11111), "chapter6", "", "");
       
        /**Connect canal Server through zk
        String zkServers = "192.168.1.121:2181";
        //Destination instance: You can customize one, such as product
        String destination = "product ";
        //Connect canal Server
        CanalConnector connector = CanalConnectors.newClusterConnector(zkServers, destination, "", "");
        **/
       
        int emptyCount = 0; //Number of empty runs
        int totalEmtryCount = 1200;//How many times the loop exits when it is empty
        try { 
            connector.connect(); //Connect
            connector.subscribe(".*\\..*"); //Subscribe to all, and do not write this line An effect //connector.subscribe("product_.*\\.product_.*");//
            Subscribe to the product table under the product database
            connector.rollback(); 
            while (emptyCount < totalEmtryCount) {
            //while(true){ //Keep looping
                //Get 1000 logs in batches (unconfirmed mode)
                Message message = connector.getWithoutAck(1000);//This value is modified according to the actual situation 
                long batchId = message.getId();
               
                //The following is the count of empty runs
                int size = message.getEntries().size(); 
                if (batchId == -1 || size == 0) { 
                    emptyCount++; 
                    System.out.println("empty count : " + emptyCount); 
                    try { 
                        Thread.sleep(1000); 
                    } catch (InterruptedException e) { 
                        e.printStackTrace(); 
                    } 
                } else { 
                    emptyCount = 0;
                    //做数据处理
                    // System.out.printf("message[batchId=%s,size=%s] \n", batchId, size); 
                    printEntry(message.getEntries());
                } 
 
                connector.ack(batchId); // 提交确认 
                // connector.rollback(batchId); // 处理失败, 回滚数据 
            } 
 
            System.out.println("empty too many times, exit"); 
        } finally { 
            connector.disconnect(); 
        } 
    } 
 
    //private static void printEntry(@NotNull List<Entry> entrys) {
    private static void printEntry(List<Entry> entrys) throws InvalidProtocolBufferException {
        for (Entry entry : entrys) {
            if (entry.getEntryType() == EntryType.TRANSACTIONBEGIN || entry.getEntryType() == EntryType.TRANSACTIONEND) { 
                continue; 
            } 
 
            //如果是行数据
            if(entry.getEntryType() == EntryType.ROWDATA){
                //则解析行变更
                RowChange rowChage = null; 
                try { 
                    rowChage = RowChange.parseFrom(entry.getStoreValue()); 
                } catch (Exception e) { 
                    throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry.toString(), e); 
                } 
               
                EventType eventType = rowChage.getEventType();
                //这里捕获binlog变更信息
//                System.out.println(String.format("================> binlog[%s:%s] , name[%s,%s] , eventType : %s", 
//                        entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(), 
// entry.getHeader().getSchemaName(), entry.getHeader().getTableName(), 
// eventType));
               
                for (RowData rowData : rowChage.getRowDatasList()) {
                    //If it is deleted, get the deleted one Data is processed for business
                    if (eventType == EventType.DELETE) { 
                        printColumn(rowData.getBeforeColumnsList());
                        //List<Column> columns = rowData.getBeforeColumnsList();
                        //delete(columns);
                       
                    //If it is new If you modify it, get the newly modified data for processing
                    } else if (eventType == EventType.INSERT || eventType == eventType.UPDATE) { 
                        //printColumn(rowData.getAfterColumnsList()); 
                        List<Column> columns = rowData.getAfterColumnsList();
                        save(columns);
                       
                    } else { 
                        System.out.println("-------> before"); 
                        printColumn(rowData.getBeforeColumnsList()); 
                        System.out.println("-------> after"); 
                        printColumn(rowData.getAfterColumnsList()); 
                    } 
                }                
               
            }
           
        } 
    } 
   
    //新增的异构操作
    private static void save(List<Column> columns) {
        for (Column col:columns) {
            String name = col.getName();
            String value = col.getValue();
            System.out.println("name: "+ name + ",value:" + value);
           
            //name: uid,value:4
            //name: name,value:10
        }
       
    }

    //private static void printColumn(@NotNull List<Column> columns) { 
    private static void printColumn(List<Column> columns) { 
        for (Column column : columns) { 
            System.out.println(column.getName() + " : " + column.getValue() + " update=" + column.getUpdated()); 
        } 
    } 

error [a jar package was not found and commented out]
import org.jetbrains.annotations.NotNull;

through the above code, capture Database log changes, and then process related business. Whether it is data heterogeneity or cache update


4.2.


empty count : 1 
empty count : 2 
empty count : 3 
empty count : 4 


4.2.6 trigger database change
create table test ( 
    uid int (4) primary key not null auto_increment, 
    name varchar(10) not null
); 
 
insert into test ( name) values('10'); 


4.2.7 client fetches mysql information:
===============> binlog[mysql-bin.000016:3281] , name[canal_test, test] , eventType : INSERT 
uid : 7 update=false 
name : 10 update=false 
empty count : 1 
empty count : 2 

[found no information captured]
tail -f /usr/server/canal/logs/example/example.log

2017-06-03 13:11:28.802 [destination = chapter6 , address = /127.0.0.1:3306 , EventParser] WARN  c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - prepare to find start position just show master status
2017-06-03 13:11:28.817 [destination = chapter6 , address = /127.0.0.1:3306 , EventParser] ERROR c.a.otter.canal.parse.inbound.mysql.MysqlEventParser - dump address /127.0.0.1:3306 has an error, retrying. caused by
com.alibaba.otter.canal.parse.exception.CanalParseException: command : 'show master status' has an error! pls check. you need (at least one of) the SUPER,REPLICATION CLIENT privilege(s) for this operation
2017-06-03 13:11:28.820 [destination = chapter6 , address = /127.0.0.1:3306 , EventParser] ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:chapter6[com.alibaba.otter .canal.parse.exception.CanalParseException: command : 'show master status' has an error! pls check. you need (at least one of) the SUPER,REPLICATION CLIENT privilege(s) for this operation
] The

reason is simple: because Mysql Need to open binlog, I set it, but it didn't restart.


This can capture mysql information:
===============> binlog[mysql-bin.000001:557] , name[chapter6,test] , eventType : INSERT
uid : 1 update=true
name : 10 update=true
empty count : 1
empty count : 2


5. Problems occurred during deployment:

(1) Startup failed, and the address in the log is in use
1. Port 11111 is occupied, you can use ls -i:11111 to check who is occupying the port by the listening process or use ps -ef | grep 11111 to check which process occupies the port number and then kill -9 process number to kill the occupied process
2. You can edit canal/ Change the port number in conf/canal.properties to an unoccupied port

(2) canal cannot capture the information that mysql triggers database changes
1. Check whether mysql has the binlog write function enabled. Check whether binlog is in row mode.
    show variables like "binlog_format" 

2. Check whether the information in the configuration files such as my.cnf and instance.properties is correct.

3. Check the client code debugging example code

4. Version compatibility issues, replace canal 1.8 with canal 1.7

and continue testing 5. View all log file analysis logs

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326261129&siteId=291194637