Three places and five centers, TiDB POC best posture exploration

Original source: https://tidb.net/blog/b4732d88

POC testing background

In a certain earthquake-prone province, in order to avoid computer room-level disasters or city-level disasters caused by earthquakes, which would render the entire system unavailable, it was planned to build a distributed high-availability database system with three locations, five centers, and five copies to ensure high availability requirements. In this system, application traffic from different regions needs to be accessed, and the early application development has adopted a horizontal sub-database strategy of hundreds of databases and hundreds of tables. In order to minimize the application ~ database delay and the delay in data retrieval from the database computing layer to the storage layer across computer rooms and cities, it is necessary to ensure that the business traffic and the data sharding leader are in the same computer room.

POC test items

#Sensitive information has been desensitized, and test case information can be used as reference and learning

image.png

Test environment information

Machine software environment configuration

There are a total of 12 Alibaba Private Cloud hosting physical machines, of which 10 are used for deployment clusters and 2 are used for deploying synchronization programs or testing HTAP capability expansion.

The configuration of a single unit is as follows:

Configuration items Configuration information
System kernel version Kirin v10, arm architecture
cpu Haiguang 64c (hyper-threaded)
Memory 512GB
disk 4*nvme SSD 3TB
tidb version v7.1.0-->v7.1.1

Machine and spatial information

There are three cities: cd, ya, lz

Five computer rooms: cd has two computer rooms AZ1 and AZ2; ya has two computer rooms AZ3 and AZ4; lz has one computer room AZ5.

Latency: The delay between two computer rooms in the same city is less than 1ms; the delay between cd and ya is 3ms; the delay between cd and lz is 7ms; the delay between ya and lz is 9ms

Machine placement topology: two machines per computer room

Flow unit control

need

The three-site data center architecture has the following requirements:

  1. db_00-24 The leader of these 25 libraries is in AZ1, db_25-49 The leader of these 25 libraries is in AZ2, db_50-74 The leader of these 25 libraries is in AZ3, db_75-99 The leader of these 25 libraries is in AZ4
  2. AZ5 cannot have a Leader. Even if any one of the first four AZs fails, AZ5 cannot have a Leader.
  3. 5 replicas (max-replicas)

image-20230927162241463.png

Solution (bug exists)

1. Label the machine

以az1的两台机器为例:
tikv_server:
  -host1
     config:
        server.lables: {az: "az1" ,host : "host1" }
  -host2
     config:
        server.lables: {az: "az1" ,host : "host2" }

2. Set placement rules according to the AZ level to determine the replica role.

az1数据副本(全能型)放置规则,(az2\3\4规则类似):
 "group_id": "pd",
        "id": "cd1",
        "start_key": "",
        "end_key": "",
        "role": "voter",
        "count": 1,
        "label_constraints": [
            {"key": "az", "op": "in", "values": ["az1"]}
        ],
        "location_labels": ["az", "host"]
az5数据副本(只同步,不提供服务)放置规则:
 "group_id": "pd",
        "id": "lz",
        "start_key": "",
        "end_key": "",
        "role": "follower",
        "count": 1,
        "label_constraints": [
            {"key": "az", "op": "in", "values": ["az5"]}
        ],
        "location_labels": ["az", "host"]

For details, please refer to the Placement Rules usage documentation | PingCAP Document Center

3. Use placement rule in sql to configure the primary replica leader placement rule

创建放置在az1数据的放置规则:
CREATE PLACEMENT POLICY p1 LEADER_CONSTRAINTS="+az=az1" FOLLOWER_CONSTRAINTS="{+az=az2: 1,+az=az3: 1,+az=az4: 1,+az=az5: 1}"
在百库百表下每个az约有进250+库,2500+表
生成更改表放置规则的sql语句,约2500+DDL
select concatenate('alter table' ,table_achema ,'.',table_name,'placement policy = p1' from information_schema.tables where right(table_schema ,2) between '00' and '24' order by table_schema
备注:在库已有放置规则的情况下,库下新建无放置规则的表

For details, please refer to Placement Rules in SQL | PingCAP Document Center

bug1

**bug description:**TiDB v7.1.0 pd scheduling bug

**bug phenomenon:**The data copy cannot be scheduled according to the placement rule set in placement rule in sql

image-20230928144438235.png

bug solution:

升级pd从v7.1.0到v7.1.1或整体升级集群到v7.1.1
可以从v7.1.1的镜像源中获取7.1.1版本pd,在一台和外网相通的机器上拉取需要的组件:
tiup mirror clone tidb-community-server-${version}-linux-amd64 ${version} --os=linux --arch=amd64
参考:https://docs.pingcap.com/zh/tidb/stable/production-deployment-using-tiup#%E5%87%86%E5%A4%87-tiup-%E7%A6%BB%E7%BA%BF%E7%BB%84%E4%BB%B6%E5%8C%85
打热补丁
tiup cluster patch <cluster-name> <package-path> [flags]
参考:https://docs.pingcap.com/zh/tidb/stable/tiup-component-cluster-patch#tiup-cluster-patch
或整体升级集群版本,参考:https://docs.pingcap.com/zh/tidb/stable/upgrade-tidb-using-tiup#%E4%BD%BF%E7%94%A8-tiup-%E5%8D%87%E7%BA%A7-tidb

bug2

**bug description: **placement rule rules cannot fully take effect

**bug phenomenon:** After setting the placement rule according to az, there will still be region leaders for system library tables such as the mysql library and information_schema library in az5, and the number of copies of some tables is greater than 5 copies.

**bug solution:** Abandon the practice of defining replica roles according to AZ and adopt the reject-leader solution. Reference: Cross-data center deployment topology | PingCAP Document Center

image-20230928145921664.png

Income after adjusting the plan

image-20230928150154012.png

The impact and exploration of obtaining TSO across cities

Problem description and preliminary analysis

In the stress test, az1, az2, az3, and az4 each accounted for 25% of the traffic. The traffic was consistent with the data master replica leader, but the response delays were inconsistent.

image-20230928153010136.png

Actual measurement confirms the impact of cross-city acquisition of TSO

image-20230928153637572.png

Optimization

Split one cluster into 4 clusters.

image-20230928154450929.png

Disaster recovery and traffic cutoff

need

1. When a computer room level disaster occurs, the traffic needs to be switched. To ensure the best performance, both the PD leader and the region leader must match the traffic as much as possible.

2. After one computer room in the same city hangs up, the traffic will be switched to another computer room in the same city first.

3. When two computer rooms in a city are all hung up, for example, az1 and az2 of CD are hung up, all traffic will be switched to az3 and az4, but not to az5.

pd leader switch

给pd menber 打上权重,保证灾难时优先调度pd leader 到同城节点
交互模式
tiup ctl:v<CLUSTER_VERSION> pd -i -u http://127.0.0.1:2379
以az1流量为例,设置pd leader 调度策略
tiup ctl:v7.1.0 pd member leader_priority  pd-1 5
tiup ctl:v7.1.0 pd member leader_priority  pd-2 3
tiup ctl:v7.1.0 pd member leader_priority  pd-3 1
tiup ctl:v7.1.0 pd member leader_priority  pd-4 1
tiup ctl:v7.1.0 pd member leader_priority  pd-5 0
手动pd leader 切换(为避免切换后不稳定,需要先调整调度权重)
tiup ctl:v7.1.0 pd member leader transfer pd3

region leader switching

Problematic switch:

第一步:
假设原放置az1的region leader需要切换到az2,执行sql获得语句,约2500+DDL
select concatenate('alter table' ,table_achema ,'.',table_name,'placement policy = p2' from information_schema.tables where right(table_schema ,2) between '00' and '24' order by table_schema
第二步:
执行获得的2500+个DDL
问题:切换时间长
数据库层操作:alter table xx placement policy az2; -- 之前是 az1
最终耗时:28 分钟

Switch after optimization:

换一个思路不再更改表绑定更换规则,而是直接更改绑定的规则的内容
ALTER PLACEMENT POLICY p1 LEADER_CONSTRAINTS="+az=az2" FOLLOWER_CONSTRAINTS="{+az=az1: 1,+az=az3: 1,+az=az4: 1,+az=az5: 1}"
切换时常约3分钟

write at the end

1. POC (proof of concept) is a very good way to test the capabilities of the database and can help us verify and understand various functions.

2. This time I only excerpted three points encountered during the entire testing process to share, hoping to help TiDB users with similar needs.

Author introduction: BraveChen, from the Digital China Titanium Team, is a professional technical team dedicated to providing enterprises with overall distributed database TiDB solutions. Team members have rich database industry backgrounds, all have TiDB advanced qualification certificates, and are active in the TiDB open source community and are officially certified partners. Currently, it has provided professional TiDB delivery services to 10+ customers, covering key industries such as finance, securities, logistics, electricity, government, and retail.

The author of the open source framework NanUI switched to selling steel, and the project was suspended. The first free list in the Apple App Store is the pornographic software TypeScript. It has just become popular, why do the big guys start to abandon it? TIOBE October list: Java has the biggest decline, C# is approaching Java Rust 1.73.0 Released A man was encouraged by his AI girlfriend to assassinate the Queen of England and was sentenced to nine years in prison Qt 6.6 officially released Reuters: RISC-V technology becomes the key to the Sino-US technology war New battlefield RISC-V: Not controlled by any single company or country, Lenovo plans to launch Android PC
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5674736/blog/10115088