TiDB architecture

  Compared with traditional stand-alone databases, TiDB has the following advantages:

  • Pure distributed architecture, with good scalability, supports flexible expansion and contraction
  • Supports SQL, exposes the MySQL network protocol to the outside, and is compatible with most MySQL syntax, and can directly replace MySQL in most scenarios
  • High availability is supported by default. In the case of a few copies failure, the database itself can automatically perform data repair and failover, which is transparent to the business
  • Support ACID transactions, friendly to some scenarios with strong consistent requirements, such as bank transfer
  • Has a rich tool chain ecology, covering multiple scenarios such as data migration, synchronization, and backup
  • In terms of kernel design, the TiDB distributed database splits the overall architecture into multiple modules, and the modules communicate with each other to form a complete TiDB system. The corresponding architecture diagram is as follows:

Insert picture description here

  • TiDB Server: SQL layer, which exposes the connection endpoint of the MySQL protocol to the outside, is responsible for accepting client connections, performing SQL analysis and optimization, and finally generating a distributed execution plan. The TiDB layer itself is stateless. In practice, multiple TiDB instances can be started, and a unified access address can be provided through load balancing components (such as LVS, HAProxy or F5). Client connections can be evenly distributed among multiple TiDB instances. In order to achieve the effect of load balancing. TiDB Server itself does not store data, but parses SQL, and forwards the actual data read request to the underlying storage node TiKV (or TiFlash).

  • PD Server: The meta-information management module of the entire TiDB cluster. It is responsible for storing the real-time data distribution of each TiKV node and the overall topology of the cluster, providing TiDB Dashboard control interface, and assigning transaction IDs for distributed transactions. PD not only stores meta-information, but also issues data scheduling commands to specific TiKV nodes based on the real-time data distribution status reported by TiKV nodes, which can be said to be the "brain" of the entire cluster. In addition, the PD itself is also composed of at least 3 nodes and has high availability capabilities. It is recommended to deploy an odd number of PD nodes.

Storage node

  • TiKV Server: Responsible for storing data. From the outside, TiKV is a distributed Key-Value storage engine that provides transactions. The basic unit of data storage is Region. Each Region is responsible for storing the data of a Key Range (from StartKey to EndKey from left closed and right open interval). Each TiKV node is responsible for multiple Regions. TiKV's API provides native support for distributed transactions at the KV key-value pair level, and provides the SI (Snapshot Isolation) isolation level by default, which is also the core of TiDB's support for distributed transactions at the SQL level. After the SQL layer of TiDB finishes the SQL analysis, it will convert the SQL execution plan into the actual call to the TiKV API. Therefore, the data is stored in TiKV. In addition, the data in TiKV will automatically maintain multiple copies (three copies by default), which naturally supports high availability and automatic failover.
  • TiFlash: TiFlash is a special type of storage node. Unlike ordinary TiKV nodes, inside TiFlash, data is stored in columnar form, and its main function is to accelerate analytical scenes.

Refer to the official document:
https://docs.pingcap.com/zh/tidb/stable/tidb-architecture

Guess you like

Origin blog.csdn.net/qq_42979842/article/details/108352294