Open source and free use|Apache Doris 2.0 launches cross-cluster data replication function

With the development of enterprise business, the system architecture tends to be complex and the data scale is constantly increasing. It is more and more common for data to be distributed and stored in different regions, data centers or cloud platforms. How to ensure the reliability of data and the continuity of online services become the focus of people's attention. On this basis, Cross-Cluster Replication (CCR) emerged as the times require, and has gradually become an important guarantee for high availability of data and services.

CCR is usually used in scenarios such as disaster recovery backup, read-write separation, data transmission between groups and companies, and isolation upgrades.

  • Disaster recovery backup: Usually, the data of the enterprise is backed up to another cluster and computer room. When an emergency causes business interruption or loss, the data can be restored from the backup or quickly switch between the master and backup. Generally, disaster recovery backup is required in scenarios with high SLA requirements, such as in finance, medical care, e-commerce and other fields.

  • Read-write separation: read-write separation is to separate the data query operation and write operation, the purpose is to reduce the mutual influence of read and write operations and improve resource utilization. For example, when the database write pressure is too high or in a high-concurrency scenario, read-write separation can be used to disperse read/write operations to read-only/write-only database cases in multiple regions, reducing the mutual influence between reads and writes, and effectively Ensure the performance and stability of the database.

  • Data transmission between the group and branch companies: In order to conduct unified management, control and analysis of the data within the group, the group headquarters usually need to synchronize the data transmission to the group headquarters in a timely manner by the branches distributed in various regions, so as to avoid management confusion and decision-making caused by inconsistency of data Mistakes are conducive to improving the management efficiency and decision-making quality of the group.

  • Isolation upgrade: When upgrading the system cluster, it may be necessary to roll back the version for some reasons. The traditional upgrade mode often cannot be rolled back because of metadata incompatibility. Using CCR can solve this problem. First build a spare cluster for upgrade and double-run verification. Users can upgrade each cluster in turn. At the same time, CCR does not depend on a specific version, making version rollback feasible.

In order to meet the needs of the above scenarios, many data products on the market have introduced CCR functions, among which Elasticsearch and ClickHouse are more representative.

  • CCR is a paid function launched by Elasticsearch. It is essentially Leader/Follower synchronization. When data is imported, it synchronizes data according to partitions, not according to the written data, which will lead to data inconsistency. .

  • ClickHouse generally implements CCR through Remote Function or ClickHouse-Copier. Remote Function is only suitable for full synchronization, which requires traversing tables and partitions for synchronization. ClickHouse-Copie also does not support incremental migration. Since ClickHouse itself has no transaction design, using Copier to synchronize data is equivalent to replica synchronization between cross-level groups. The consistency of synchronization cannot be guaranteed, and synchronization at the DB level cannot be configured. It needs to be configured one by one according to the table, the use process is relatively complicated, and the usability is average.

Since CCR is a strong demand of enterprises in terms of system service availability, many manufacturers include it in the paid value-added function of the product, and it needs to purchase the enterprise version to use it. Adhering to the principle of open source and openness, we officially launched CCR in version 2.0 of Apache Doris to serve the majority of open source users.

Compared with Elasticsearch and Clickhouse, Apache Doris CCR can synchronize the data changes of the source cluster to the target cluster at the library/table level, and can finely control the synchronization range according to the scenario; users can also flexibly choose full or incremental synchronization according to needs, effectively improving It improves the flexibility and efficiency of data synchronization; in addition, Doris CCR also supports DDL synchronization, and the DDL statements executed by the source cluster can be automatically synchronized to the target cluster, thus ensuring data consistency. Doris CCR is also very simple to configure and use, and can quickly complete cross-cluster data replication with simple operations. Based on the excellent capabilities of Doris CCR, it can better achieve read-write load separation and multi-computer room backup, and can better support cross-cluster replication requirements in different scenarios.

Design of the Doris CCR

In version 2.0 of Apache Doris, we introduced the Binlog mechanism to track data modification records, including Meta Binlog and Data Binlog. In order to achieve data synchronization between clusters, we introduced an external component, Syncer, to obtain the latest Binlog through Syncer and play it back to the downstream cluster to achieve data synchronization. At the same time, we added a series of mechanisms to clean up redundant Binlog. The specific implementation includes:

Add Binlog

In versions prior to Apache Doris 2.0, we could not trace the Apache Doris data modification records, and the data change records are the pre-dependence for implementing CCR. To solve this problem, in Apache Doris 2.0, we introduced the Binlog mechanism, which automatically records data modification records and operations through the Binlog mechanism to achieve data traceability. At the same time, we can also implement data replay based on the Binlog playback mechanism. put and restore. Since we support database table-level synchronization, it is necessary to add Binlog-related attributes to DB/Table when using it. Currently, Binlog supports enable and ttl_seconds.

Note : If you want to use the CCR function, enabling Binlog is a necessary prerequisite.

-- Table
alter table binlog set ("binlog.enable" = "true"); //开启 binlog
alter table binlog set ("binlog.ttl_seconds" = "864000"); // 配置 binlog 过期时间

-- DB
alter database ccr set properties ("binlog.enable" = "true");
alter database ccr set properties ("binlog.ttls" = "864000");

Persistence mechanism

In order to ensure timely recovery after system crashes or various emergencies, we have introduced a persistence mechanism to persist data to disk to ensure data reliability and consistency. Data persistence mainly involves the metadata information stored by FE and the actual data itself stored by BE. After enabling the Binlog attribute, FE and BE will persist the modification records of DDL/DML operations into Meta Binlog and Data Binlog. When performing data operations, FE will trigger corresponding log records. We have enhanced the EditLog implementation to ensure log ordering. By constructing an increasing sequence of LogIDs, each operation is accurately recorded and persisted in order. This ordered persistence mechanism helps ensure data consistency.

When FE initiates a Publish Transaction, BE will execute the corresponding Publish operation, and BE will write the metadata information of this Transaction involving Rowset into the KV prefixed with rowet_meta, and persist it into the Meta storage. The imported Segment Files are linked to the Binlog folder. In this way, metadata of FE and data of BE can construct a logical Binlog series. This mechanism can realize data recovery through physical file playback or logical playback, and can provide an effective solution in terms of performance and reliability.

C1.png

data playback

In order to better connect the source cluster and the target cluster, we introduced an intermediate synchronization and control component - Syncer. Through Syncer, the data of the source cluster can be extracted to the target cluster, and the Binlog series can be extracted to another cluster for data playback.

Implementation:

  • Supports physical file playback, using the physical file playback method, can effectively reproduce the data operation process.

  • Syncer uses FE CommitSeq as a cursor to get the Meta Binlog of the next Commit. Syncer will coordinate the downstream BE to the upstream BE to extract the real Binlog file according to the Meta Binlog information. This mechanism not only ensures the consistency of playback data, but also ensures efficient data synchronization performance.

  • When processing synchronization, Syncer first uses Snapshot-level Backup/Restore to perform full data recovery on Doris when creating tasks, and then performs incremental data recovery based on the restored Snapshot's CommitSeq.

C2.png

Binlog data cleaning

As more and more data is imported, there will be more and more data operations recorded by Binlog, and the occupied storage resources will gradually increase. Therefore, we need a data recycling mechanism to clean up redundant Binlogs.

When we clean up Binlog data, we need to pay attention to the synchronization status of DB and Table Binlog GC when configuring Binlog GC. For example, when the user enables Table Binlog before enabling DB Binlog, and then disables DB Binlog, we need to keep the relevant configuration of the previous Table Binlog Enable state, because the cleaning conditions of DB take precedence over those of Table. If DB Binlog is Enable state, then the Binlog needs to be cleaned up according to the GC time of the DB, otherwise, the cleaning status of the DB and the Table will be inconsistent, which will lead to the inconsistency of the Binlog.

Faced with this situation, the FE side will periodically scan the expired Binlog according to the expiration time of the Binlog, and send the corresponding cleanup expiration request to BE, and BE will perform metadata and Rowset on the corresponding Tablet according to the last Commit Seq Binlog cleaning. In this process, we need to pay attention to the overlapping of DB and Table Binlog.

How to use CCR

Terms and Conditions

At present, when using CCR, it is temporarily necessary to enable Doris's Root permission. Other notices are as follows:

  • For the Binlog process of the source cluster, Master Token is required, and the Master Token needs Root permission to obtain on FE

  • For other Binlog acquisitions to the meta-cluster, only the Show permission is required

  • The synchronization of Binlog itself only needs to enable the Load permission on the target cluster of the table or DB

Installation and deployment

  1. Deploy the source cluster and target Doris cluster

  2. Deploy the data synchronization component Syncer

Download and compile the source code

git clone https://github.com/selectdb/ccr-syncer
cd ccr-syncer

# -j 开启多线程编译
# --output指定输出的路径名称,默认名称为output
bash build.sh <-j NUM_OF_THREAD> <--output SYNCER_OUTPUT_DIR>

The compiled source code is in the Output folder, similar to Doris, the start-stop script is in Bin, and the executable file is in Lib

# SYNCER_OUTPUT_DIR是编译的输出路径
# SYNCER_DEPLOY_DIR是实际部署的路径
cp -r SYNCER_OUTPUT_DIR SYNCER_DEPLOY_DIR
cd SYNCER_DEPLOY_DIR

# 启动syncer,加上--daemon使syncer在后台运行
bash bin/start_syncer.sh --daemon

# 停止syncer
bash bin/stop_syncer.sh

configuration task

  1. Add the following configuration in the conf file of FE/BE to enable Binlog
enable_feature_binlog=true
  1. Open the binlog of the synchronization library/table in the target cluster
-- enable database binlog
ALTER DATABASE ccr SET properties ("binlog.enable" = "true");

-- enable table binlog
ALTER TABLE enable_binlog SET ("binlog.enable" = "true");
  1. Initiate a synchronization task to Syncer
curl -X POST -H "Content-Type: application/json" -d '{
    "name": "ccr_test",
    "src": {
      "host": "localhost",
      "port": "9030",
      "thrift_port": "9020",
      "user": "root",
      "password": "",
      "database": "demo",
      "table": "example_tbl"
    },
    "dest": {
      "host": "localhost",
      "port": "9030",
      "thrift_port": "9020",
      "user": "root",
      "password": "",
      "database": "ccrt",
      "table": "copy"
    }
}' http://127.0.0.1:9190/create_ccr

Supplementary parameter description:

  • name: The name of the CCR synchronization task, only unique

  • host, port: corresponding to the port of the Host and MySQL (JDBC) of the cluster Master

  • thrift_port: rpc_port corresponding to FE

  • user, password: Indicates what identity Syncer uses to open transactions, pull data, etc.

  • database、table:

    • If it is synchronization at the library level, dbName and tableName are empty

    • If it is table-level synchronization, you need to fill in dbName, tableName, not empty

View and cancel status

  1. View sync progress
curl -X POST -H "Content-Type: application/json" -d '{
    "name": "ccr_test"
}' http://127.0.0.1:9190/get_lag
  1. stop task
curl -X POST -H "Content-Type: application/json" -d '{
    "name": "ccr_test"
}' http://127.0.0.1:9190/stop_ccr

Data synchronization performance measurement

In order to test the data synchronization efficiency of CCR, we also conducted an import test based on the full amount of data. During the import process of the full amount of data, the 2TB data can be synchronized in less than 4 hours, and the writing speed of a single node exceeds 170MB per second . The specific results are shown in the table below. As the cluster scale expands, the efficiency of data writing increases linearly.

It should be noted that the performance test is tested on a specific environment and configuration, and the results are related to different environments, versions, and configurations. This is only for reference.

full sync

Both the source cluster and the target cluster are 1FE 1BE clusters. The system information and hardware information are as follows:

C3.png

C4.png

Source cluster data volume: 2097152MB

Target cluster data volume: 0

Full synchronization performance test results

C5.png

Subsequent planning

Currently Doris CCR already supports table and library level data synchronization. Specifically, it supports various data import methods at the table level, supports lightweight and heavyweight Schema Changes, including adding single-table materialized views, etc., to support more flexible data synchronization requirements; at the same time, CCR also supports dynamic partitioning and manual partitioning wait. The whole database synchronization is supported at the database level , and all table data in the source cluster can be synchronized to the target cluster. In addition, CCR also supports the synchronization operation of creating and deleting tables. When creating or deleting tables in the source cluster, it can be automatically synchronized to the target cluster to achieve data synchronization and consistency.

In the future, we will continue to make efforts to continuously improve the synchronization ability and performance of Doris CCR, mainly including:

  • Further enhance DDL operations at the database level to provide more flexible and reliable data synchronization and management functions;

  • Support user-defined Binlog consumption, directly use the Select statement to consume Binlog, and return data according to the corresponding Driver, such as Rowset of the MySQL protocol;

  • Support logical data format synchronization, allowing users to let the target cluster BE go to the source cluster BE to obtain incremental data (Binlog) in standard formats such as CSV or Parquet, which is convenient for users to synchronize between multiple incompatible versions of the underlying BE data format (Rowset) ;

  • Support hot and cold separation, improve support for Doris's own hierarchical storage;

  • Library-level synchronization supports blacklist, the purpose is to filter out certain tables, users can conveniently use CCR without setting a separate synchronization task for each table because some tables in the library do not need to be synchronized (convenient to ensure that several tables business above)

  • The target cluster supports active/standby switching, and enabling Binlog can synchronize incremental data to the source cluster;

  • Enhance Syncer's operation and maintenance and observability-related functions, improve Syncer's operation and maintenance deployment capabilities, and monitor Syncer's overhead and synchronization task progress. It enables Syncer to support more related operation and maintenance operations, support for distributed deployment and synchronization progress, and support for various DBs.

Students who have relevant needs are also welcome to actively feedback their needs or questions in the comment area.

# about the author:

Xu Ruiliang, Senior R&D Engineer of SelectDB

Li Shiyang, Ecological R&D Engineer of SelectDB

Guess you like

Origin blog.csdn.net/SelectDB_Fly/article/details/132100007