Option 8: CockroachDB
Cockroachdb is a distributed database that supports transactions, SQL operations, and KV storage mode. The three founders of CockroachDB are all from Google, and its architecture is inspired by Google's Spanner and F1, cockroach open source address . have:
- Standard SQL interface, using PostgreSQL protocol, supporting standard SQL interface, compatible with relational database SQL ecosystem;
- Strong scalability, high concurrency, and support for MPP-like parallel query framework;
- Elastic capacity expansion, maintaining on-demand capacity expansion, automatic load balancing;
- Multiple copies are strongly consistent, and the raft algorithm is used to ensure data consistency;
- The service is highly available, decentralized, and no SPOF;
- Distributed transaction, realize transaction control based on MVCC, support SI and SSI isolation levels;
research
build table
DROP TABLE IF EXISTS "tracks";
CREATE TABLE IF NOT EXISTS "tracks" (
"id" SERIAL PRIMARY KEY ,
"third_tracks_id" varchar(32) NOT NULL DEFAULT '' ,
"tracks_title" varchar(255) NOT NULL DEFAULT '' ,
"tracks_title_other" varchar(255) NOT NULL DEFAULT '',
"tracks_title_py" varchar(64) NOT NULL DEFAULT '' ,
"data_source" bigint DEFAULT 1 NOT NULL,
"tags" varchar(255) NOT NULL DEFAULT '',
"duration" bigint DEFAULT 0 NOT NULL,
"status" int DEFAULT 0 NOT NULL,
"pa" int DEFAULT 0 NOT NULL,
"announcer_name" varchar(255) NOT NULL DEFAULT '',
"anchor_name" varchar(255) NOT NULL DEFAULT '',
"play_count" bigint DEFAULT 0 NOT NULL,
"own_count" bigint DEFAULT 0 NOT NULL,
"paid" int DEFAULT 0 NOT NULL,
"info" text NOT NULL,
"created_at" timestamp NOT NULL,
"updated_at" timestamp NOT NULL,
"data_updated" bigint NOT NULL,
"created" timestamp NOT NULL,
"updated" timestamp NOT NULL,
"announcer_id" varchar(256) NOT NULL DEFAULT '',
"anchor_id" varchar(256) NOT NULL DEFAULT '',
UNIQUE INDEX "idx_thirdTrackId_dataSource" (third_tracks_id ASC, data_source ASC),
INDEX "idx_announcerid_status_paid_playcount" (announcer_id ASC, status ASC, paid ASC, play_count DESC)
);
important point
1. Data table fields are not supported to add comments; 2. Mass data deletion is not supported:
> DELETE FROM tracks where id <100000;
pq: kv/txn_coord_sender.go:428: transaction is too large to commit: 189948 intents
Because strong consistency is required, if a large amount of data is deleted, the cluster delay will increase. If you need to delete a large amount of data, you can use: segment deletion
alter table tracks rename to tracks_0907;
for (i=1;i<count(*);i+=2000){
DELETE FROM tracks_0907 where id <= i;
}
Performance pressure test
Pressure test tool Go language to implement pressure test script
Pressure measurement method Three machines execute the pressure measurement script, the duration of each pressure measurement is 5-10min, and each line of records is about 1.6kb. Because the test cluster has 3 performances as follows:
M02-XI3
整机SN216486580
CPU【INTEL Xeon E5-2650 V4 12C 2.2GHZ】*2
内存【LANGCHAO PC4-19200 16G】*8
硬盘【LANGCHAO SATA 3T 7.2K】*4
FLASH【LANGCHAO NVMe SSD 800G】*1
网卡【LANGCHAO INTEL 82599】*1
加速卡
RAID无硬件RAID卡
Simulate the scenario of migrating online data from MySQL to NewSQL as much as possible. When many people see mysql or other database performance test reports, they see that the qps are about tens of thousands, but if you pay attention, you will find that the single-line record of the test is only 50 bytes, which is inconsistent with the online data.
The number of connections for 3 sets is about 1000, so the test will be limited to 1000 concurrency.
Pressure measurement form
serial number | SQL | The amount of data | Concurrency | qps | 99th percentile delay | 90th percentile delay | SQL Byte Traffic | Remark |
---|---|---|---|---|---|---|---|---|
1 | insert | 1500w | 100 | 666 | 152ms | 117ms | 832kb | 18 newsql indexes |
2 | insert | 1500w | 300 | 687 | 352ms | 187ms | 872kb | 18 newsql indexes |
3 | insert | 1500w | 900 | 778 | 700ms | 1500ms | 1.2MB | 18 newsql indexes |
4 | insert | 1500w | 1000 | 807 | 1500s | 1200ms | 1.3MB | 18 newsql indexes |
5 | insert | 1500w | 100 | 1051 | 42ms | 12ms | 1.2MB | 1 newsql index |
6 | insert | 1500w | 300 | 2254 | 92ms | 32ms | 3.2MB | 1 newsql index |
7 | insert | 1500w | 600 | 4021 | 130ms | 56ms | 6.1MB | 1 newsql index |
8 | insert | 1500w | 900 | 5938 | 250ms | 148ms | 8.4MB | 1 newsql index |
9 | insert | 1500w | 1000 | 6125 | 270ms | 171ms | 8.7MB | 1 newsql index |
10 | select * from tracks WHERE id = 随机id AND status = 0 |
1500w | 300 | 5625 | 30ms | 10ms | 7.3MB | primary key index |
11 | select * from tracks WHERE id = 随机id AND status = 0 |
1500w | 600 | 8713 | 45ms | 8ms | 11.7MB | primary key index |
12 | select * from tracks WHERE id = 随机id AND status = 0 |
1500w | 1000 | 12320 | 160ms | 130ms | 16.2MB | primary key index |
13 | select * from tracks WHERE id (随机20ids) AND status = 0 | 1500w | 300 | 2134 | 200ms | 140ms | 29.7MB | primary key index |
14 | select * from tracks WHERE id (随机20ids) AND status = 0 | 1500w | 600 | 2526 | 420ms | 350ms | 34.1MB | primary key index |
15 | select * from tracks WHERE id (随机20ids) AND status = 0 | 1500w | 1000 | 2650 | 771ms | 640ms | 36.1MB | primary key index |
16 | select * from tracks WHERE id (随机50ids) AND status = 0 | 1500w | 300 | 714 | 670ms | 540ms | 23.2MB | primary key index |
17 | select * from tracks WHERE id (随机50ids) AND status = 0 | 1500w | 600 | 672 | 1700ms | 1300ms | 21.4MB | primary key index |
18 | select * from tracks WHERE id (随机50ids) AND status = 0 | 1500w | 600 | 757 | 3000ms | 2490ms | 24.7MB | primary key index |
19 | SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id ) AND "data_source" = 随机公司 AND "status" = 0 |
1500w | 300 | 5553 | 40ms | 5ms | 4.1MB | (third_tracks_id , data_source ) index |
20 | SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id ) AND "data_source" = 随机公司 AND "status" = 0 |
1500w | 600 | 8624 | 120ms | 18ms | 6.0MB | (third_tracks_id , data_source ) index |
21 | SELECT *FROM "tracks" WHERE "third_tracks_id" IN (随机1个id ) AND "data_source" = 随机公司 AND "status" = 0 |
1500w | 1000 | 1145 | 310ms | 90ms | 8.7MB | (third_tracks_id , data_source ) index |
22 | SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id ) AND "paid" = 0 AND "status" = 0 order by play_count DESC |
1500w | 300 | 5493 | 160ms | 3ms | 5.1MB | (announcer_id ASC, status ASC, paid ASC, play_count DESC)索引 |
23 | SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id ) AND "paid" = 0 AND "status" = 0 order by play_count DESC |
1500w | 600 | 7825 | 283ms | 23ms | 7.5MB | (announcer_id ASC, status ASC, paid ASC, play_count DESC)索引 |
24 | SELECT *FROM "tracks" WHERE "announcer_id" IN (随机1个id ) AND "paid" = 0 AND "status" = 0 order by play_count DESC |
1500w | 1000 | 11171 | 310ms | 68ms | 10.5MB | (announcer_id ASC, status ASC, paid ASC, play_count DESC )索引 |
25 | select * from tracks WHERE id = 随机id AND status = 0 |
3000w | 300 | 5614 | 123ms | 3ms | 8.3MB | 主键索引 |
26 | select * from tracks WHERE id = 随机id AND status = 0 |
3000w | 600 | 8723 | 130ms | 8ms | 12.4MB | 主键索引 |
27 | select * from tracks WHERE id = 随机id AND status = 0 |
3000w | 1000 | 10764 | 320ms | 30ms | 16.2MB | 主键索引 |
28 | select * from tracks WHERE id = 随机id AND status = 0 |
5000w | 300 | 5136 | 159ms | 8ms | 7.8MB | 主键索引 |
29 | select * from tracks WHERE id = 随机id AND status = 0 |
5000w | 600 | 8463 | 180ms | 13ms | 11.8MB | 主键索引 |
30 | select * from tracks WHERE id = 随机id AND status = 0 |
5000w | 1000 | 10848 | 220ms | 26ms | 16.3MB | 主键索引 |
数据分析
- 与MySQL等数据库一样,存在索引时候
insert
的数据会慢上许多,但是在cockroachDB中会更加明显一些。
图中可以得出:
- 大量索引对insert的影响是巨大,会导致写库操作qps大降,在提高并发数后qps亦没有显著提升;
- 在仅有主键索引时候,insert的qps随着并发数的提升得到相应的提升,到了6000qps后再提升并发数效果就不再明显了。
- 通过对主键id的获取数据量不同的压测,可以得到如下图:
图中可以得出:
- id数越多需要查询的range就越多qps就越低;
- The greater the number of ids, the greater the amount of data obtained, the larger the network IO, and the lower the performance;
- Comparing the query efficiency of the primary key index and the composite index, we can get:
The figure shows:
- The query efficiency with index will be greatly improved, and the query efficiency of primary key index, unique index and compound index is basically the same;
- The primary key query efficiency is optimal;
- Comparing the query efficiency in different numbers of data records: the
figure can be drawn:
- The size of the data in the table does not have a great impact on qps, at least it is acceptable at the level of tens of millions;
- The smaller the amount of data in the table, the faster the speed;