Introduction to POLARDB

1. Introduction

Please add image description

  • POLARDB is a next-generation relational distributed database self-developed by Alibaba Cloud. It is 100% compatible with MySQL. Applications that previously used MySQL can use POLARDB without modifying a single line of code.
  • POLARDB is a multi-node cluster in its operating form. There is a Writer node (master node) and multiple Reader nodes in the cluster. The nodes among them share the same underlying storage (PolarStore) through the distributed file system (PolarFileSystem).
  • POLARDB provides external services through the internal proxy layer (Proxy), which means that all applications first go through this layer of proxy before accessing specific database nodes.
  • Proxy can perform security authentication (Authorization) and protection (Protection).
  • Parse SQL and send write operations (such as transactions, Update, Insert, Delete, DDL, etc.) to the Writer node,
  • Parse SQL and evenly distribute read operations (such as Select) to multiple Reader nodes. This is also called read-write separation.
  • POLARDB provides two database addresses by default, one is the cluster address (Cluster) and the other is the primary address (Primary)
  • It is recommended to use the cluster address because it has the read-write separation function and can integrate the resources of all nodes to provide external services.
  • The primary address always points to the primary node, and SQL accessing the primary address is sent to the primary node. When a primary-standby switchover (Failover) occurs, the primary address will automatically drift to the new primary node within 30 seconds to ensure that the application Always connected are the master nodes that are writable and readable.

2. Characteristics

In addition to using POLARDB just like MySQL, there are some advantages that traditional MySQL databases do not have.

  • large capacity

Up to 100T, you no longer need to purchase multiple MySQL instances for sharding due to the ceiling of single machine capacity, and you don’t even need to consider sub-databases and tables, simplifying application development and reducing operation and maintenance burdens.

  • High cost performance

Multiple nodes only charge for one storage, which means the more read-only instances, the more cost-effective they are.

  • Minute-level flexibility

The architecture of separation of storage and computing, coupled with shared storage, makes rapid upgrades a reality.

  • read consistency

The read-write address of the cluster is separated, and LSN (Log Sequence Number) is used to ensure global consistency when reading data and avoid inconsistency problems caused by primary and secondary delays.

  • Millisecond latency – physical replication

Use Redo-based physical replication to replace Binlog-based logical replication to improve the efficiency and stability of primary and secondary replication. Even large table DDL operations such as adding indexes and adding fields will not cause delays to the database.

  • Lockless backup

Using snapshots of the storage layer, a database with a size of 2T can be backed up within 60 seconds. Moreover, this backup process does not require locking the database, has almost no impact on applications, and can be backed up 24 hours a day.

  • Complex SQL query acceleration

The built-in parallel query engine has a significant acceleration effect on complex analytical SQL that takes more than 1 minute to execute. This feature requires an additional connection address.

3. PolarFS

  • The following technologies are used in PolarFS design to fully utilize I/O performance:
  • PolarFS uses a single-threaded finite state machine bound to the CPU to process I/O, avoiding the context switching overhead of the multi-threaded I/O pipeline.
  • PolarFS optimizes memory allocation, uses MemoryPool to reduce the cost of memory object construction and destruction, and uses huge pages to reduce the cost of paging and TLB updates.
  • PolarFS uses a central and local autonomous structure. All metadata is cached in the memory of each component of the system, basically completely avoiding additional metadata I/O.
  • PolarFS uses a full user space I/O stack, including RDMA and SPDK, to avoid the overhead of the kernel network stack and storage stack.

In comparative tests under the same hardware environment, the write performance of three copies of data blocks in PolarFS is close to the latency performance of a single-copy local SSD. This greatly improves the single-instance TPS performance of POLARDB while ensuring data reliability.

4. PolarDB log

The pioneering introduction of physical logs (Redo Logs) in the database PolarDB instead of traditional logical logs not only greatly improves the efficiency and accuracy of replication, but also saves 50% of I/O operations, especially for those with frequent writes. Or a newer database, the performance can be improved by more than 50%.

Please add image description

五、PolarProxy

The purpose of PolarProxy is to integrate the resources of multiple underlying computing nodes together and provide a unified entrance for applications to access. This greatly reduces the cost of using databases for applications and facilitates the migration from old systems to POLARDB. Migration and switching.
In essence, PolarProxy is a capacity-adaptive distributed stateless database proxy cluster. Its dynamic horizontal expansion capability can maximize the advantages of POLARDB in rapidly increasing and decreasing read nodes, improving the throughput of the entire database cluster and making ECS ​​access to it more efficient. The more, the higher the concurrency, the more obvious the advantage.

Guess you like

Origin blog.csdn.net/qq_39813400/article/details/129082696