I would like to thank @yocoo for providing Tencent Cloud hosting for performance testing
Github: https://github.com/brokercap/Bifrost
Gitee : https://gitee.com/jc3wish/Bifrost
Machine A: A program for running Bifrost + randomly generated data written to MySQL
Configuration: 4 cores
Memory: 8G
Machine C: MySQL
Configuration: 4 cores
Memory: 8G
Disk: 1000G
Machine C: ClickHouse
Configuration: 4 cores
Memory: 8G
Disk: 1000G
Software version:
Bifrost Version:v1.6.2-release
MySQL Version: 5.7.30-log
ClickHouse Version : 20.8.3.18
insertDataTest Version : 1.6.2
1. Preparations
1). Download Bifrost source code and compile insertDataTest
git clone -b https://github.com/brokercap/Bifrost.git ./BifrostV1.6.x
cd BifrostV1.6.x/test
go build ./insertDataTest.go
2). Install MySQL and ClickHouse and create bifrost_test library and corresponding table name
MySQL table structure ( build 5 MySQL tables )
CREATE TABLE `binlog_field_test_5` ( `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT, `testtinyint` TINYINT(4) NOT NULL DEFAULT '-1', `testsmallint` SMALLINT(6) NOT NULL DEFAULT '-2', `testmediumint` MEDIUMINT(8) NOT NULL DEFAULT '-3', `testint` INT(11)) NOT NULL DEFAULT '-4', `testbigint` BIGINT(20) NOT NULL DEFAULT '-5', `testvarchar` VARCHAR(400) NOT NULL, `testchar` CHAR(2) NOT NULL, `testenum` ENUM('en1','en2','en3') NOT NULL DEFAULT 'en1', `testset` SET('set1','set2','set3') NOT NULL DEFAULT 'set1',`testtime` TIME NOT NULL DEFAULT '00:00:01', `testdate` DATE NOT NULL DEFAULT '1970-01-01', `testyear` YEAR(4) NOT NULL DEFAULT '1989', `testtimestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, `testdatetime` DATETIME NOT NULL DEFAULT '1970-01-01 00:00:00',
`testfloat` FLOAT(9,2) NOT NULL DEFAULT '0.00', `testdouble` DOUBLE(9,2) NOT NULL DEFAULT '0.00', `testdecimal` DECIMAL(9,2) NOT NULL DEFAULT '0.00', `testtext` TEXT NOT NULL, `testblob` BLOB NOT NULL, `testbit` BIT(64) NOT NULL DEFAULT b'0', `testbool` TINNYINT(1) NOT NULL DEFAULT '0', `testmediumblob` MEDIUMBLOB NOT NULL, `testlongblob` LONGBLOB NOT NULL, `testtinyblob` TINYBLOB NOT NULL, `test_unsinged_tinyint` TINYINT(4) UNSIGNED NOT NULL DEFAULT '1', `test_unsinged_smallint` SMALLINT(6) UNSIGNED NOT NULL DEFFAULT '2', `test_unsinged_mediumint` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '3', `test_unsinged_int` INT(11) UNSIGNED NOT NULL DEFAULT '4', `test_unsinged_bigint` BIGINT(20) UNSIGNED NOT NULL DEFAULT '5', PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=40000000000 DEFAULT CHARSET=utf8;
ClickHouse table structure ( build 5 ClikcHouse tables )
CREATE TABLE binlog_field_test_5 ( `id` UInt64, `testtinyint` Int8, `testsmallint` Int16, `testmediumint` Int32, `testint` Int32, `testbigint` Int64, `testvarchar` String, `testchar` String, `testenum` String, `testset` String, `testtime` String, `testdate` Date, `testyear` Int16, `testtimestamp` DateTime, `testdatetime` DateTime, `testfloat` Float64, `testdouble` Float64, `testdecimal` String, `testtext` String, `testblob` String, `testbit` Int64, `testbool` Int8, `testmediumblob` String, `testlongblob` String, `testtinyblob` String, `test_unsinged_tinyint` UInt8, `test_unsinged_smallint` UInt16, `test_unsinged_mediumint` UInt32, `test_unsinged_int` UInt32, `test_unsinged_bigint` UInt64, `binlog_event_type` String, `bifrost_data_version` Int64 ) ENGINE = MergeTree() ORDER BY id
2. The first test
2.1 Start the test
MySQL creates 5 tables of binlog_field_test and uses the Innodb engine
Synchronization to ClickHouse synchronization configuration adopts one synchronization configuration per table, field strong binding relationship, and adopts log append mode synchronization
Brush data to MySQL (execute on machine A, the same machine as Bifrost)
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_1 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./1.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_2 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./2.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_3 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./3.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_4 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./4.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_5 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./5.log
2.2 The first test result
When the above configuration is synchronized, the Bifrost synchronization site can catch up with the MySQL write speed, with an average of 2.6 million data parsing and 360Mb data processing speed per minute.
However, after running for 17 hours, the MySQL disk was full, and the data could not be written, and the synchronization was suspended. A total of more than 2.9 billion data was written, and about 3 billion pieces of data were written. Since no records were saved, the CPU usage of the Bifrost process was about Floating around 160%-210%, and the actual memory usage is about 220Mb - 700Mb before floating.
After MySQL has no data to write, Bifrost memory drops to 220Mb, and no process exits, indicating that as long as data comes in, it can continue to run stably.
3. Second test
3.1 Start the test
The MyISAM engine table structure is used to actually synchronize 3 tables, and the speed of brushing data into the table changes each time
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_1 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./1.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_2 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./2.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_3 -count 1000000000 -batchsize 2000 -time_interval 50 >> ./3.log
3.2 Second Test Results
In actual execution, the Bifrost binlog parsing cannot keep up with the writing speed of MySQL, and the synchronization volume of Bifrost per minute is about 2.7-2.8 million
4. The third test
4.1 Start the test
Using the MyISAM engine table structure, 3 tables are actually synchronized, and each table refreshes 1000 pieces of data every 100ms
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_1 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./1.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_2 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./2.log
nohup ./insertDataTest -host 10.0.3.31 -user xxtest -pwd xxtest -schema bifrost_test -table binlog_field_test_3 -count 1000000000 -batchsize 1000 -time_interval 100 >> ./3.log
After the actual test, Binlog parsing can basically catch up with a slight delay, which can prove that MyISAm is faster than Innodb when single-threaded single-thread data is brushed.
The parsing speed is about 2.7 million per minute, and the size is 380Mb (it seems that under the current hardware configuration, when ClickHouse brushes data every 1000 pieces, this speed may be the limit)
hardware usage
Bifrost uses about 550Mb of memory, which is slightly higher than the Innodb engine. As for why, unknown
The CPU also fluctuates around 160% - 210%
After synchronizing 620 million data, try to write 10 million data to the fourth table to see if the sites can keep up. As a result, the sites start to widen, the gap is almost 200 million, and it gradually widens, processing every minute The amount and size of data has not increased, indicating that the current 2.7 million/minute, 380Mb/minute is the upper limit of the current hardware configuration. When the difference between the positions is about 100s, the data brushing process of the fourth table is suspended.
4.2 Results of the third test
It shows that there is no synchronization data at 4:40. In fact, the viewing log was written at 4:29, because the time here can only be a parameter for this traffic statistics, and there will be an aggregation time, which is calculated according to the time displayed by the traffic.
It took about 17.5 hours to synchronize 3 billion data, and the average rate was about 2.95 million pieces/minute, 389M/minute
hardware usage
The synchronization is completed, and the actual usage of Bifrost drops back to 240M
5. The fourth test (full test)
5.1 Start the test
Since the full configuration supports the number of threads for pulling data and synchronizing data, for convenience, here we synchronize the data of the three tables binlog_field_test_1, binlog_field_test_2, and binlog_field_test_3 to the binlog_field_test_all table in ClickHouse
3 threads for full task pulling data and 6 threads for synchronizing data
Sync screenshot
During the synchronization process, the machine load can be full, and the memory usage is relatively low
5.2 Results of the fourth test
The full amount of 3069529000 pieces of data took a total of 12 hours and 33 minutes
After the full synchronization is completed, the CPU and memory return to the normal state
6. Summary of this round of testing
Run Bifrost on a 4-core 8G machine to synchronize data
There are 30 table fields. During incremental synchronization, the synchronization performance is about 2.8 million per minute, about 380MB in size, about 4 billion per day. And the CPU usage is 2 cores, and the memory is less than 700MB. If there is more data in the table, and the amount of data in a single table is smaller, the performance may be better, and the CPU can be more fully used.
Because the number of threads for pulling data and synchronizing data can be configured, the data itself is a synchronization concept that can be disordered. It can be dynamically configured according to the number of CPU cores. After testing, if the configuration is reasonable, the CPU can be full, and the memory usage is not high.
On the whole, Bifrost does not have high memory requirements. When choosing a configuration, you can filter computing machines!
Increment | Full | |
Tabel Count | 3 | 3 |
Data Count | 3069529000 | 3069529000 |
Select/Parse Thread Count | 1 | 3 |
Sync Thread Count | 3 | 6 |
Max CPU Uasge | 220% | 400% |
Max Memory Use | < 1G | < 1G |
Use time | 17 hours 30 minutes | 12 hours 33 minutes |
Data Count/minute | 2,923,360 | 4,076,399 |