A preliminary understanding of TiDB storage engine

Introduction to TiDB

TiDB is an open source distributed relational database independently designed and developed by PingCAP. It is a converged distributed database product that supports both online transaction processing and online analytical processing (Hybrid Transactional and Analytical Processing, HTAP). It has horizontal expansion or contraction capabilities. Capacity, financial-grade high availability, real-time HTAP, cloud-native distributed database, compatibility with MySQL 5.7 protocol and MySQL ecosystem and other important features. The goal is to provide users with one-stop OLTP (Online Transactional Processing), OLAP (Online Analytical Processing), and HTAP solutions.TiDB is suitable for various application scenarios such as high availability, strong consistency requirements, and large data scale.

Distributed Systems

A distributed system is a system whose components are located on different networked computers and then communicate and coordinate by passing messages to each other. These components interact to achieve a common goal; in other words, a distributed system combines the needs of The science of dividing a large amount of calculated engineering data into several small pieces, calculating and storing them separately by multiple computers, and then uniformly merging the results into data conclusions;In essence, it is to divide and conquer data storage and calculation;

CAP theory

consistency

refers to the data consistency of all nodes at the same time;
all nodes see the same data at the same time;

Availability

The service is available within normal response time;
reads and writes always succeed;

Partition tolerance

A distributed system can still provide external services that meet consistency or availability when encountering a node or network partition failure;

Application scenarios

  • Scenarios with financial industry attributes that require high data consistency, high reliability, high system availability, scalability, and disaster recovery.

TiDB uses multi-copy + Multi-Raft protocol to schedule data to different computer rooms, racks, and machines. When some machines fail, the system can automatically switch to ensure that the system's RTO <= 30s and RPO = 0.

  • Massive data and high-concurrency OLTP scenarios that require high storage capacity, scalability, and concurrency.

iDB adopts a computing and storage separation architecture, which can expand and shrink computing and storage respectively. The computing supports a maximum of 512 nodes, each node supports a maximum of 1000 concurrency, and the cluster capacity supports a maximum of PB level.

  • Real-time HTAP scenario

TiDB introduced the column storage engine TiFlash in version 4.0 and combined it with the row storage engine TiKV to build a true HTAP database. With a small increase in storage costs, online transaction processing and real-time data analysis can be done in the same system, greatly saving enterprises cost.

  • Data aggregation and secondary processing scenarios

The business synchronizes data to
TiDB through ETL tools or TiDB synchronization tools. Reports can be directly generated in TiDB through SQL. (ETL is the process of loading business system data into the data warehouse after extraction, cleaning and conversion. The purpose is to integrate scattered, messy, and non-standard data in the enterprise to provide analytical basis for enterprise decision-making)

relational model

In traditional online transaction scenarios, the relational model is still the standard; the key to a relational database is that it must have transactions;

affairs

The essence of a transaction is: the unit of concurrency control is a sequence of operations defined by the user
; these operations must be done either all or none, and it is an indivisible task unit; in order to ensure that the system is always in a complete and correct state;

ACID properties

atomicity

All operations included in a transaction are an indivisible whole; either all of them are executed or none of them are executed;

consistency

Before and after a transaction, all data remains in a consistent state; data consistency detection cannot be violated;

Isolation

The degree of mutual influence between concurrent transactions; it mainly stipulates the behavior of multiple concurrent transactions accessing the same data resource, and the behavior of each concurrent transaction accessing the data resource; different isolation is to deal with different phenomena (dirty reading, rereadable, phantom reading, etc.) Reading, etc.);

persistence

Once the transaction is completed, the changes made to the data must be recorded; including data storage and multi-copy network backup;

Comparison with traditional non-distributed database architecture

  • Both support ACID and strong transaction consistency;
  • Distributed architecture, component decoupling, good scalability, and support for flexible expansion and contraction;
  • High availability is supported by default. When a few copies fail, the database can automatically fail over and be transparent to the business;
  • Adopting horizontal expansion, it has inherent advantages in business scenarios with large data volume and high throughput;
  • The strength lies not in the response speed of lightweight simple SQL, but in the throughput of a large number of highly concurrent SQL;

TiDB distributed database overall architecture

  • It is composed of multiple modules, and each module communicates with each other to form a complete TiDB system;
  • Front-end stateless, back-end stateful (Raft);
  • Compatible with MySQL;
    Insert image description here

Guess you like

Origin blog.csdn.net/m0_68678128/article/details/134962865