Teach you step by step how to easily migrate OceanBase data to Apache Doris|Practical Guide

Author | SelectDB technical team

As a widely recognized distributed database, OceanBase has been widely used in many enterprise-critical business systems. In the Apache Doris community , many users choose to build powerful data processing and analysis links based on OceanBase and Apache Doris. This article will introduce in detail how to migrate/synchronize data from OceanBase to Apache Doris conveniently and efficiently.

Practical guide

00 Environment preparation

Use Docker to start the Oceanbase service. To build the OceanBase Docker environment, please refer to the Oceanbase documentation - Use Docker to deploy the OceanBase database.

docker run -p 2881:2881 --name oceanbase -e MINI_MODE=1 -d oceanbase/oceanbase-ce:4.0.0.0

Create a table in OceanBase and add data

[root@VM-10-6-centos ~]$ mysql -h127.0.0.1 -P2881 -uroot

mysql> CREATE DATABASE ob;                                                                                                                                 
Query OK, 1 row affected (0.01 sec)
                                                                                                                                                           
mysql> use ob;                                                                                                                                             
Database changed  

mysql> CREATE TABLE student (                                                                                                                              
    -> id int,                                                                                                                                     
    -> name varchar(256),                                                                                                                              
    -> age int,                                                                                                                                    
    -> primary key (id)                                                                                                                                  
    -> );                                                                                                                                                  
Query OK, 0 rows affected (0.06 sec)


mysql> insert into student values(1, 'zhangsan01', 18),                                                                                                    
    ->                           (2, 'zhangsan02', 23),                                                                                                    
    ->                           (3, 'zhangsan03', 30),                                                                                                    
    ->                           (4, 'zhangsan04', 35),                                                                                                    
    ->                           (5, 'zhangsan05', 40);                                                                                                    
Query OK, 5 rows affected (0.01 sec)                                                                                                                       
Records: 5  Duplicates: 0  Warnings: 0  

Create a table in Doris

[root@VM-10-6-centos ~]$ mysql -h127.0.0.1 -P9030 -uroot -p

mysql> CREATE TABLE `student` (                                                                                                                            
    ->    id int,                                                                                                                                          
    ->   `name` varchar(256),
    ->   `age` int                                                                                                                              
    -> ) ENGINE=OLAP                                                                                                                                       
    -> UNIQUE KEY(`id`)                                                                                                                                    
    -> COMMENT 'OLAP'                                                                                                                                      
    -> DISTRIBUTED BY HASH(`id`) BUCKETS 1                                                                                                                 
    -> PROPERTIES (                                                                                                                                        
    -> "replication_allocation" = "tag.location.default: 1"                                                                                                
    -> );                                                                                                                                                  
Query OK, 0 rows affected (0.06 sec) 

01 Synchronize using DataX

DataX is an open source version of Alibaba Cloud DataWorks data integration. It provides two components, OceanBaseReader and DorisWriter, which can easily migrate data from OceanBase to Doris. The specific usage steps are:

1. Download DataX

2. Write DataX configuration file

{                                                                                                                                                          
    "job": {                                                                                                                                               
        "setting": {                                                                                                                                       
            "speed": {                                                                                                                                     
                "channel": 1                                                                                                                               
            }                                                                                                                                              
        },                                                                                                                                                 
        "content": [                                                                                                                                       
            {                                                                                                                                              
                "reader": {                                                                                                                                
                    "name": "oceanbasev10reader",                                                                                                          
                    "parameter": {                                                                                                                         
                        "username": "root",                                                                                                                
                        "password": "123456",                                                                                                              
                        "column": ["*"],                                                                                                                   
                        "connection": [                                                                                                                    
                            {                                                                                                                              
                                "table": ["student"],                                                                                                   
                                "jdbcUrl": ["jdbc:oceanbase://127.0.0.1:2881/ob"]                                                                       
                            }                                                                                                                              
                        ]                                                                                                                                  
                    }                                                                                                                                      
                },                                                                                                                                         
                "writer": {                                                                                                                                
                    "name": "doriswriter",                                                                                                                 
                    "parameter": {                                                                                                                         
                        "loadUrl": ["127.0.0.1:28737"],                                                                                                   
                        "loadProps": {                                                                                                                     
                        },                                                                                                                                 
                        "column": ["*"],                                                                                                                   
                        "username": "root",                                                                                                                
                        "password": "",                                                                                                                    
                        "postSql": [],                                                                                                                     
                        "preSql": [],                                                                                                                      
                        "flushInterval":10000,                                                                                                             
                        "connection": [                                                                                                                    
                          {                                                                                                                                
                            "jdbcUrl": "jdbc:mysql://127.0.0.1:29737/test",                                                                               
                            "selectedDatabase": "test",                                                                                                    
                            "table": ["student"]                                                                                                     
                          }                                                                                                                                
                        ],                                                                                                                                 
                        "loadProps": {                                                                                                                     
                            "format": "json",                                                                                                              
                            "strip_outer_array": true                                                                                                      
                        }                                                                                                                                  
                    }                                                                                                                                      
                }                                                                                                                                          
            }                                                                                                                                              
        ]                                                                                                                                                  
    }                                                                                                                                                      
}        

For more configuration, please refer to Oceanbasev10reader and DorisWriter .

3. Execute DataX script

python2 bin/datax.py oceanbase2doris.json

Execute DataX script.png

4. Complete data synchronization. The data in Doris is:

mysql> select * from student;                                                                                                                              
+------+------------+------+                                                                                                                               
| id   | name       | age  |                                                                                                                               
+------+------------+------+                                                                                                                               
|    1 | zhangsan01 |   18 |                                                                                                                               
|    2 | zhangsan02 |   23 |                                                                                                                               
|    3 | zhangsan03 |   30 |                                                                                                                               
|    4 | zhangsan04 |   35 |                                                                                                                               
|    5 | zhangsan05 |   40 |                                                                                                                               
+------+------------+------+                                                                                                                               
5 rows in set (0.02 sec) 

02 Use Catalog synchronization

Using the Catalog function supported by Apache Doris , the data table in Oceanbase can be mapped to Doris, and the data can be synchronized to Doris through Insert.

Download the OceanBase driver package to the FE and BE jdbc_driversdirectories, and execute the operations in the following code in sequence:

-- 创建catalog
CREATE CATALOG jdbc_oceanbase PROPERTIES (
    "type"="jdbc",
    "user"="root",
    "password"="123456",
    "jdbc_url" = "jdbc:oceanbase://127.0.0.1:2881/ob",
    "driver_url" = "oceanbase-client-2.4.2.jar",
    "driver_class" = "com.oceanbase.jdbc.Driver"
)

-- 在doris中查询oceanbase的表
mysql> select * from jdbc_oceanbase.ob.student;                                                                                                            
+------+------------+------+                                                                                                                               
| id   | name       | age  |                                                                                                                               
+------+------------+------+                                                                                                                               
|    1 | zhangsan01 |   18 |                                                                                                                               
|    2 | zhangsan02 |   23 |                                                                                                                               
|    3 | zhangsan03 |   30 |                                                                                                                               
|    4 | zhangsan04 |   35 |                                                                                                                               
|    5 | zhangsan05 |   40 |                                                                                                                               
+------+------------+------+                                                                                                                               
5 rows in set (0.02 sec)

mysql> CREATE TABLE internal.test.student                                                                                                                  
    -> PROPERTIES("replication_num" = "1")                                                                                                                 
    -> AS SELECT * FROM jdbc_oceanbase.ob.student;                                                                                                         
Query OK, 5 rows affected (0.07 sec)                                                                                                                       
{'label':'label_139f7d7f13ba491b_85038d67c9e3ae32', 'status':'VISIBLE', 'txnId':'12014'}


mysql> select * from internal.test.student;                                                                                                                
+------+------------+------+                                                                                                                               
| id   | name       | age  |                                                                                                                               
+------+------------+------+                                                                                                                               
|    5 | zhangsan05 |   40 |                                                                                                                               
|    1 | zhangsan01 |   18 |                                                                                                                               
|    2 | zhangsan02 |   23 |                                                                                                                               
|    4 | zhangsan04 |   35 |                                                                                                                               
|    3 | zhangsan03 |   30 |                                                                                                                               
+------+------------+------+                                                                                                                               
5 rows in set (0.03 sec) 

03 Use Flink CDC synchronization

Flink CDC provides the OceanBase CDC connector , allowing snapshot data and incremental data to be read from OceanBase. The specific steps are as follows:

1. Prepare the environment

Start OceanBase and OBLogProxy

docker pull oceanbase/oceanbase-ce:4.0.0.0
docker run --name oceanbase --network=host -e MINI_MODE=1 -d oceanbase/oceanbase-ce:4.0.0.0

docker pull whhe/oblogproxy:1.1.0_4x
docker run --network=host --name oceanbase_proxy -e OB_SYS_USERNAME=root -e OB_SYS_PASSWORD=123456 -d whhe/oblogproxy:1.1.0_4x

2. Set password

In OceanBase, the Root user does not have a password by default. OBLogProxy requires a system tenant user with a non-empty password, so you need to set a password for the root @sys user first.

-- 登陆root用户的sys租户,
mysql -h127.0.0.1 -P2881 -uroot@sys 

-- 设置密码为上面的OB_SYS_PASSWORD
MySQL [(none)]> ALTER USER root IDENTIFIED BY '123456';                                                                                                    
Query OK, 0 rows affected (0.02 sec)    

-- 进入root用户的test租户,单独设置密码test
mysql -h127.0.0.1 -P2881 -uroot@test 
MySQL [(none)]> ALTER USER root IDENTIFIED BY 'test';
Query OK, 0 rows affected (0.02 sec)

-- 创建数据库表和数据
mysql> CREATE DATABASE ob;
mysql> USE ob;
mysql> CREATE TABLE student (                                                                                                                              
    -> id int,                                                                                                                                     
    -> name varchar(256),                                                                                                                              
    -> age int,                                                                                                                                    
    -> primary key (id)                                                                                                                                  
    -> );                                                                                                                                                  
Query OK, 0 rows affected (0.06 sec)

mysql> insert into student values(1, 'zhangsan01', 18),                                                                                                    
    ->                           (2, 'zhangsan02', 23),                                                                                                    
    ->                           (3, 'zhangsan03', 30),                                                                                                    
    ->                           (4, 'zhangsan04', 35),                                                                                                    
    ->                           (5, 'zhangsan05', 40);                                                                                                    
Query OK, 5 rows affected (0.01 sec)                                                                                                                       
Records: 5  Duplicates: 0  Warnings: 0  

3. Flink environment configuration

Place the OceanBase CDC jar package and Doris Connector in FLINK_HOME/libthe directory, and restart the Flink cluster.

4. Execute Flink SQL tasks

SET 'execution.checkpointing.interval' = '3s';

CREATE TABLE student (
    id INT,
    name STRING,
    age INT,
    PRIMARY KEY (id) NOT ENFORCED
  ) WITH (
    'connector' = 'oceanbase-cdc',
    'scan.startup.mode' = 'initial',
    'username' = 'root@test',
    'password' = 'test',
    'tenant-name' = 'test',
    'database-name' = 'ob',
    'table-name' = 'student',
    'hostname' = 'localhost',
    'port' = '2881',
    'rootserver-list' = '127.0.0.1:2882:2881',
    'logproxy.host' = 'localhost',
    'logproxy.port' = '2983',
    'working-mode' = 'memory'
  );
 
CREATE TABLE doris_sink (
    id INT,
    name STRING,
    age INT
)
WITH (
'connector' = 'doris',
'fenodes' = '10.16.10.6:28737',
'table.identifier' = 'test.student',
'username' = 'root',
'password' = ''
);
  
INSERT into doris_sink select * from student;

After submitting the task, you can query the full amount of synchronized data in Doris

mysql> select * from  student;                                                                                                                             
+------+------------+------+                                                                                                                               
| id   | name       | age  |                                                                                                                               
+------+------------+------+                                                                                                                               
|    1 | zhangsan01 |   18 |                                                                                                                               
|    2 | zhangsan02 |   23 |                                                                                                                               
|    3 | zhangsan03 |   30 |                                                                                                                               
|    4 | zhangsan04 |   35 |                                                                                                                               
|    5 | zhangsan05 |   40 |                                                                                                                               
+------+------------+------+                                                                                                                               
5 rows in set (0.01 sec)  

Next, simulate new data in OceanBase

MySQL [ob]> insert into student values(6, 'zhangsan06', 48)                                                                                                
    -> ;                                                                                                                                                   
Query OK, 1 row affected (0.13 sec)  

After submitting the task, you can query the synchronized new data in Doris

mysql> select * from  student;                                                                                                                             
+------+------------+------+                                                                                                                               
| id   | name       | age  |                                                                                                                               
+------+------------+------+                                                                                                                               
|    1 | zhangsan01 |   18 |                                                                                                                               
|    2 | zhangsan02 |   23 |                                                                                                                               
|    3 | zhangsan03 |   30 |                                                                                                                               
|    4 | zhangsan04 |   35 |                                                                                                                               
|    5 | zhangsan05 |   40 |                                                                                                                               
|    6 | zhangsan06 |   48 |                                                                                                                               
+------+------------+------+                                                                                                                               
6 rows in set (0.02 sec) 

Note: OceanBase 3.x and 4.x versions are supported. You need to pay attention to the version matching relationship between OBLogProxy and OceanBase. For details, please refer to GitHub Release

04 Use Outfile to export

You can also use Oceanbase's Outfile function to export data to local or OSS, and import data into Doris based on Doris' Stream Load/S3 Load capabilities. Here is a local file as an example:

MySQL [ob]> select * from student;                                                                                                                         
+----+------------+------+                                                                                                                                 
| id | name       | age  |                                                                                                                                 
+----+------------+------+                                                                                                                                 
|  1 | zhangsan01 |   18 |                                                                                                                                 
|  2 | zhangsan02 |   23 |                                                                                                                                 
|  3 | zhangsan03 |   30 |                                                                                                                                 
|  4 | zhangsan04 |   35 |                                                                                                                                 
|  5 | zhangsan05 |   40 |                                                                                                                                 
|  6 | zhangsan06 |   48 |                                                                                                                                 
+----+------------+------+                                                                                                                                 
6 rows in set (0.00 sec)   

MySQL [ob]> SELECT id,name,age INTO OUTFILE '/home/student.csv' 
            FIELDS TERMINATED BY ','
            LINES TERMINATED BY '\n' FROM student;
Query OK, 3 rows affected (0.01 sec)


#cat student.csv
1,zhangsan01,18
2,zhangsan02,23
3,zhangsan03,30
4,zhangsan04,35
5,zhangsan05,40
6,zhangsan06,48

Execute Stream Load in Doris and import local files into Doris

curl  --location-trusted -u root:  -H "column_separator:," -T student.csv http://127.0.0.1:28737/api/test/student/_stream_load

After the import is completed, the imported data can be queried in Doris

mysql> select * from student;                                                                                                                              
+------+------------+------+                                                                                                                               
| id   | name       | age  |                                                                                                                               
+------+------------+------+                                                                                                                               
|    1 | zhangsan01 |   18 |                                                                                                                               
|    2 | zhangsan02 |   23 |                                                                                                                               
|    3 | zhangsan03 |   30 |                                                                                                                               
|    4 | zhangsan04 |   35 |                                                                                                                               
|    5 | zhangsan05 |   40 |                                                                                                                               
|    6 | zhangsan06 |   48 |                                                                                                                               
+------+------------+------+                                                                                                                               
6 rows in set (0.05 sec)    

Data type mapping

OceanBase database can support both MySQL and Oracle modes in the same system, so the Apache Doris type mapping is also the same as MySQL and Oracle. This means that when OceanBase establishes a mapping relationship with Apache Doris, it can define tables and columns according to the table below to create smooth data migration/synchronization operations.

01 MySQL schema type mapping

MySQL schema type mapping.png

For details, please refer to: JDBC Catalog - MySQL Documentation

02 Oracle Schema Type Mapping

Oracle schema type mapping.png

For details, please refer to: JDBC Catalog - Oracle Documentation

Conclusion

This article introduces a variety of ways to synchronize OceanBase data with Doris, which can meet the synchronization needs of different scenarios. If you need to synchronize offline data, you can choose the DataX/Catalog/Outfile method; if you need to synchronize real-time data, you can directly choose the Flink CDC method. In addition, both full data and incremental data synchronization can be completed through Flink CDC.

Linus took it upon himself to prevent kernel developers from replacing tabs with spaces. His father is one of the few leaders who can write code, his second son is the director of the open source technology department, and his youngest son is an open source core contributor. Robin Li: Natural language will become a new universal programming language. The open source model will fall further and further behind Huawei: It will take 1 year to fully migrate 5,000 commonly used mobile applications to Hongmeng. Java is the language most prone to third-party vulnerabilities. Rich text editor Quill 2.0 has been released with features, reliability and developers. The experience has been greatly improved. Ma Huateng and Zhou Hongyi shook hands to "eliminate grudges." Meta Llama 3 is officially released. Although the open source of Laoxiangji is not the code, the reasons behind it are very heart-warming. Google announced a large-scale restructuring
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/selectdb/blog/11053991