Bifrost 2-core 1G memory can help you achieve 3 billion+ data synchronization every day!

I would like to thank @yocoo for providing Tencent Cloud hosting for performance testing

Github: https://github.com/brokercap/Bifrost

Giteehttps://gitee.com/jc3wish/Bifrost

Machine A: A program for running Bifrost + randomly generated data written to MySQL

Configuration: 4 cores

Memory: 8G

 

Machine C: MySQL

Configuration: 4 cores

Memory: 8G

Disk: 1000G

 

Machine C: ClickHouse

Configuration: 4 cores

Memory: 8G

Disk: 1000G

 

Software version:

Bifrost Version:v1.6.2-release

MySQL Version: 5.7.30-log

ClickHouse Version : 20.8.3.18

insertDataTest Version : 1.6.2

 

1. Preparations

1). Download Bifrost source code and compile insertDataTest

git clone -b https://github.com/brokercap/Bifrost.git ./BifrostV1.6.x

cd BifrostV1.6.x/test

go build ./insertDataTest.go

2). Install MySQL and ClickHouse and create bifrost_test library and corresponding table name

MySQL table structure ( build 5 MySQL tables )

CREATE TABLE `binlog_field_test_5` (   `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,   `testtinyint` TINYINT(4) NOT NULL DEFAULT '-1',   `testsmallint` SMALLINT(6) NOT NULL DEFAULT '-2',   `testmediumint` MEDIUMINT(8) NOT NULL DEFAULT '-3',   `testint` INT(11)) NOT NULL DEFAULT '-4',   `testbigint` BIGINT(20) NOT NULL DEFAULT '-5',   `testvarchar` VARCHAR(400) NOT NULL,   `testchar` CHAR(2) NOT NULL,   `testenum` ENUM('en1','en2','en3') NOT NULL DEFAULT 'en1',   `testset` SET('set1','set2','set3') NOT NULL DEFAULT 'set1',`testtime` TIME NOT NULL DEFAULT '00:00:01',   `testdate` DATE NOT NULL DEFAULT '1970-01-01',   `testyear` YEAR(4) NOT NULL DEFAULT '1989',   `testtimestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,   `testdatetime` DATETIME NOT NULL DEFAULT '1970-01-01 00:00:00',
  `testfloat` FLOAT(9,2) NOT NULL DEFAULT '0.00',   `testdouble` DOUBLE(9,2) NOT NULL DEFAULT '0.00',   `testdecimal` DECIMAL(9,2) NOT NULL DEFAULT '0.00',   `testtext` TEXT NOT NULL,   `testblob` BLOB NOT NULL,   `testbit` BIT(64) NOT NULL DEFAULT b'0',   `testbool` TINNYINT(1) NOT NULL DEFAULT '0',   `testmediumblob` MEDIUMBLOB NOT NULL,   `testlongblob` LONGBLOB NOT NULL,   `testtinyblob` TINYBLOB NOT NULL,   `test_unsinged_tinyint` TINYINT(4) UNSIGNED NOT NULL DEFAULT '1',   `test_unsinged_smallint` SMALLINT(6) UNSIGNED NOT NULL DEFFAULT '2',   `test_unsinged_mediumint` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '3',   `test_unsinged_int` INT(11) UNSIGNED NOT NULL DEFAULT '4',   `test_unsinged_bigint` BIGINT(20) UNSIGNED NOT NULL DEFAULT '5',   PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=40000000000 DEFAULT CHARSET=utf8;

 

ClickHouse table structure ( build 5 ClikcHouse tables )

CREATE TABLE binlog_field_test_5 ( `id` UInt64, `testtinyint` Int8, `testsmallint` Int16, `testmediumint` Int32, `testint` Int32, `testbigint` Int64, `testvarchar` String, `testchar` String, `testenum` String, `testset` String, `testtime` String, `testdate` Date, `testyear` Int16, `testtimestamp` DateTime, `testdatetime` DateTime, `testfloat` Float64, `testdouble` Float64, `testdecimal` String, `testtext` String, `testblob` String, `testbit` Int64, `testbool` Int8, `testmediumblob` String, `testlongblob` String, `testtinyblob` String, `test_unsinged_tinyint` UInt8, `test_unsinged_smallint` UInt16, `test_unsinged_mediumint` UInt32, `test_unsinged_int` UInt32, `test_unsinged_bigint` UInt64, `binlog_event_type` String, `bifrost_data_version` Int64 ) ENGINE = MergeTree() ORDER BY id

 

2. The first test

2.1 Start the test

MySQL creates 5 tables of binlog_field_test and uses the Innodb engine

Synchronization to ClickHouse synchronization configuration adopts one synchronization configuration per table, field strong binding relationship, and adopts log append mode synchronization

 

Brush data to MySQL (execute on machine A, the same machine as Bifrost)

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_1 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./1.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_2 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./2.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_3 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./3.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_4 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./4.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_5 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./5.log

2.2 The first test result

When the above configuration is synchronized, the Bifrost synchronization site can catch up with the MySQL write speed, with an average of 2.6 million data parsing and 360Mb data processing speed per minute.

However, after running for 17 hours, the MySQL disk was full, and the data could not be written, and the synchronization was suspended. A total of more than 2.9 billion data was written, and about 3 billion pieces of data were written. Since no records were saved, the CPU usage of the Bifrost process was about Floating around 160%-210%, and the actual memory usage is about 220Mb - 700Mb before floating.

After MySQL has no data to write, Bifrost memory drops to 220Mb, and no process exits, indicating that as long as data comes in, it can continue to run stably.

 

3. Second test

3.1 Start the test

The MyISAM engine table structure is used to actually synchronize 3 tables, and the speed of brushing data into the table changes each time

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_1 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./1.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_2 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./2.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_3 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./3.log

3.2 Second Test Results

In actual execution, the Bifrost binlog parsing cannot keep up with the writing speed of MySQL, and the synchronization volume of Bifrost per minute is about 2.7-2.8 million

 

4. The third test

4.1 Start the test

Using the MyISAM engine table structure, 3 tables are actually synchronized, and each table refreshes 1000 pieces of data every 100ms

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_1 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./1.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_2 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./2.log

nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_3 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./3.log

After the actual test, Binlog parsing can basically catch up with a slight delay, which can prove that MyISAm is faster than Innodb when single-threaded single-thread data is brushed.

The parsing speed is about 2.7 million per minute, and the size is 380Mb (it seems that under the current hardware configuration, when ClickHouse brushes data every 1000 pieces, this speed may be the limit)

hardware usage

Bifrost uses about 550Mb of memory, which is slightly higher than the Innodb engine. As for why, unknown

The CPU also fluctuates around 160% - 210%

After synchronizing 620 million data, try to write 10 million data to the fourth table to see if the sites can keep up. As a result, the sites start to widen, the gap is almost 200 million, and it gradually widens, processing every minute The amount and size of data has not increased, indicating that the current 2.7 million/minute, 380Mb/minute is the upper limit of the current hardware configuration. When the difference between the positions is about 100s, the data brushing process of the fourth table is suspended.

4.2 Results of the third test

It shows that there is no synchronization data at 4:40. In fact, the viewing log was written at 4:29, because the time here can only be a parameter for this traffic statistics, and there will be an aggregation time, which is calculated according to the time displayed by the traffic.

It took about 17.5 hours to synchronize 3 billion data, and the average rate was about 2.95 million pieces/minute, 389M/minute

hardware usage

The synchronization is completed, and the actual usage of Bifrost drops back to 240M

 

5. The fourth test (full test)

5.1 Start the test

Since the full configuration supports the number of threads for pulling data and synchronizing data, for convenience, here we synchronize the data of the three tables binlog_field_test_1, binlog_field_test_2, and binlog_field_test_3 to the binlog_field_test_all table in ClickHouse

 

3 threads for full task pulling data and 6 threads for synchronizing data

 

Sync screenshot

During the synchronization process, the machine load can be full, and the memory usage is relatively low

5.2 Results of the fourth test 

The full amount of 3069529000 pieces of data took a total of 12 hours and 33 minutes

After the full synchronization is completed, the CPU and memory return to the normal state

6. Summary of this round of testing

Run Bifrost on a 4-core 8G machine to synchronize data

There are 30 table fields. During incremental synchronization, the synchronization performance is about 2.8 million per minute, about 380MB in size, about 4 billion per day. And the CPU usage is 2 cores, and the memory is less than 700MB. If there is more data in the table, and the amount of data in a single table is smaller, the performance may be better, and the CPU can be more fully used.

Because the number of threads for pulling data and synchronizing data can be configured, the data itself is a synchronization concept that can be disordered. It can be dynamically configured according to the number of CPU cores. After testing, if the configuration is reasonable, the CPU can be full, and the memory usage is not high.

On the whole, Bifrost does not have high memory requirements. When choosing a configuration, you can filter computing machines!

 

  Increment Full
Tabel Count 3 3
Data Count 3069529000 3069529000 
Select/Parse Thread Count 1 3
Sync Thread Count 3 6
Max CPU Uasge 220% 400%
Max Memory Use < 1G < 1G
Use time 17 hours 30 minutes 12 hours 33 minutes
Data Count/minute 2,923,360 4,076,399
     
     

 

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324134609&siteId=291194637