Huawei Cloud Database GaussDB (for openGauss): meet for the first time, get to know

Abstract: This article introduces GaussDB (for openGauss) from the overall architecture, main scenarios, and key technical features.

1. Background introduction

Online shopping www.cqfenfa.com

On March 16, during the first live broadcast of the GaussDB (for openGauss) series of technologies hosted by Huawei Cloud, "Understanding Huawei Cloud Database GaussDB (for openGauss)", there was a question: open source databases are so fragrant, why does Huawei download Kung Fu self-developed GaussDB (for openGauss)?

In fact, many open source databases are weak in ease of use and supporting capabilities, and require constant maintenance. Moreover, once data loss is encountered, it is difficult to recover quickly, and the loss caused is immeasurable. Therefore, the open source database cloud can only solve the requirements of small and medium-sized enterprises such as simplified deployment, operation and maintenance, optimization, and extreme cost-effectiveness.

At the same time, open source databases have to face various large and small cost expenditures such as servers, database maintenance and upgrades, and human operation and maintenance. It is difficult to meet the rapid expansion and sustainable development of the business. Facing large enterprises such as finance, government and enterprises that have stringent requirements for data security, response speed, reliability, and availability, they need ultra-high availability, complete functions, excellent performance, open ecology, and extremely flexible enterprise-level database services.

GaussDB (for openGauss) is an enterprise-level distributed relational database based on the openGauss self-developed ecosystem based on Huawei's deep integration of many years of database field experience, fully combining the needs of enterprise-level scenarios. Currently, it supports both single-shard and distributed deployment forms. On the basis of supporting traditional services, it continues to build competitive features, which provides unlimited possibilities for enterprises to face the challenges of the 5G era.

In order to let everyone quickly understand GaussDB (for openGauss), the Huawei Cloud database team prepared the GaussDB (for openGauss) series of technical live broadcasts. This article will introduce the overall architecture, main scenes, and key technical features of the first live broadcast.

2. Overall architecture: unified distributed architecture based on data sharding

v2-7d74d36ef47dce220598e4b7f2b90120_720w.jpg

GaussDB (for openGauss) unifies the distributed architecture based on data sharding (share nothing). The underlying data is scattered and distributed to different data nodes through certain rules such as hash, list, or range, and multiple nodes at the bottom are calculated. Participate in the calculation together. At the same time, the data node can be expanded, and the upper layer is SQL parsing and forwarding by the coordinating node.

As you can see from the figure, it mainly includes three types of nodes: coordination nodes, data nodes, and cluster nodes (the most important is the global transaction manager). The coordinating node is responsible for SQL parsing and forwarding, acting like a proxy, the data node is responsible for computing and data storage, and the global transaction manager is responsible for ensuring the read consistency of the global transaction.

This architecture builds the following core advantages for GaussDB (for openGauss):

  1. Extremely high availability:  Two-region three-center architecture, real-time disaster recovery across regions
  2. Data security:  Achieve strong data consistency across AZ deployments and ensure zero data loss
  3. High scalability: containerized deployment, performance and capacity can be expanded horizontally on demand, up to 1000+ nodes
  4. Strong performance: Kunpeng 2-way server, 32 nodes 12 million tpmC (Huawei internal test)
  5. Full stack software and hardware self-developed and controllable: industry-leading Kunpeng + openGauss self-developed open kernel

3. Main scene

Scenario 1: Traditional core transaction

For traditional applications, a single-slice mode can be used, which is the same as the traditional active-standby mode. GaussDB (for openGauss) combined with Kunpeng's in-depth optimization, the performance is very outstanding, and the usability has been greatly improved, which is very suitable for traditional commercial database replacement scenarios.

v2-5d52a028e21d7b6abf3e48d4486a2164_720w.jpg

Scenario 2: Future massive transaction type

With the advent of the 5G era, it is difficult for a single node to cope with the ever-increasing data scale and the need to ensure performance, while a cross-node, horizontally scalable database can well solve the need for large-scale and massive data computing and storage. The GaussDB (for openGauss) distributed model can support up to 1000+ nodes, PB-level storage, and strong consistency of distributed transactions, which can well meet the Internet+ requirements of the government, transportation, finance, energy and other industries.

v2-de2044980b4856fa24be3c4cc9212fc6_720w.jpg

Key role

In order to facilitate everyone to better understand the technical operation status of GaussDB (for openGauss), some key roles of GaussDB (for openGauss) will be introduced below:

v2-a05544cb4f693913d8a9f309a3155388_720w.jpgv2-b408238a363aaf7bf751a7b9a378b44d_720w.jpg

4. Key technical characteristics

GaussDB (for openGauss) is based on a distributed architecture with separation of computing and storage, and has built 6 core technical features. The following will explain these 6 features in detail.

Key Technology One: High Performance—Distributed Execution Framework

v2-8ea9ac9832927f781d924075601770ad_720w.jpg

The general implementation process of this feature is:

  1. Business applications issue SQL to Coordinator, SQL can include CRUD operations on data;
  2. Coordinator uses the optimizer of the database to generate an execution plan, and each DN will process the data according to the requirements of the execution plan;
  3. Data is distributed in each DN based on a consistent Hash algorithm. Therefore, DN may need to obtain data from other DNs in the process of data processing. GaussDB provides three streams (broadcast stream, aggregation stream and redistribution stream) to realize data in DN. Flow between
  4. DN returns the result set to Coordinate for summary;
  5. The Coordinator returns the aggregated results to the business application.

Huawei has many years of experience in SQL execution optimization. Even complex SQL and transaction analysis (HTAP) scenarios can be optimally executed, for example:

  • Cost-based optimization
  • Base estimation: Feedback enhancement, AI base enhancement
  • Cost estimation: row storage/column storage cost estimation, network communication cost estimation
  • Search algorithm: dynamic programming method, genetic algorithm, AI search
  • Distributed execution plan capability
  • Light Proxy
  • Fast Query Shipping
  • Remote Query Shipping
  • Self-developed Cascade optimizer
  • Object processing rule application and search task
  • Pruning technology based on branch and bound

Through the distributed query engine, distributed scheduling engine, and distributed storage engine, GaussDB (for openGauss) perfectly achieves the automatic sharding of data, and uses the query optimizer to continuously improve the processing efficiency of the execution plan while automatic load balancing; The node provides sub-scene stream (broadcast stream, aggregation stream, redistribution stream) for different data scenarios, continuously improving the interaction efficiency between multi-sharded data nodes, and automatically completes the summary of data results to ensure the global consistency of distributed transactions .

Key technology two: high performance—distributed transaction processing performance, GTM-Lite technology

v2-2592d4ac5b7d9e73db43794e028b895b_720w.jpg

The advantages of this feature are:

  • High-performance transaction management: Supports lock-free, multi-version, and high-concurrency transaction technology.
  • Distributed strong consistency: The distributed GTM-Lite solution provides global transaction snapshots and commit number management to achieve strong consistency and no central node performance bottleneck.

Key technology 3: High performance—Scale-up capability, breaking through Kunpeng 4P NUMA-Aware new architecture, realizing a breakthrough in 4P server performance

v2-8a53bb387a6dbbec9d9434cd4a519428_720w.jpg

GaussDB (for openGauss) uses Numa Aware technology to optimize series based on the characteristics of the Kunpeng processor's multi-core NUMA architecture, and uses core-binding technology to avoid cross-core access to memory and reduce latency; through the application of redo log batch insertion, hot data NUMA distribution, Clog partitioning and other key technologies give full play to the advantages of multi-core computing power, and continuously reduce access delays, log write conflicts, and index update conflicts. Currently based on the Taishan Kunpeng server, the TPCC performance pressure test is 1.5 times that of the x86 of the same specification.

Key technology four: high availability-cluster HA, multi-level redundancy to achieve no single point of failure in the system

GaussDB (for openGauss) achieves no single point of failure of the entire system software and hardware through hardware redundancy, instance redundancy, and data redundancy. Different from traditional database software products, GaussDB (for openGauss) mainly focuses on providing high availability and high reliability guarantee through software capabilities. Based on the software and hardware base, Huawei Cloud achieves high availability of end-to-end databases, and supports end-to-end monitoring and detection of the entire scene, which can more timely and reliably ensure that users' applications are online, zero data is lost, and a full stack is realized. No single point of failure.

v2-5ac673845aa6bb84135e0c4e4f3d9cee_720w.jpg

Highly available technology points

High hardware availability:

  • Storage: Disk RAID redundancy.
  • Network: Dual switch redundancy.
  • Network card: multiple network card redundancy.
  • Host: UPS power protection

Software high availability:

  • Coordinating node CN instance multiple active redundancy
  • Data node / global transaction management / cluster manager instance Active-Standby redundancy

Troubleshooting

  • Network fault detection and processing (switch router, etc.)
  • Network card failure detection and processing (local network card failure detection)
  • Disk failure detection and processing: disk heartbeat, processing error codes returned by the file system
  • Host power failure detection and processing: Heartbeat mechanism
  • Cluster instance failure detection and processing (CN/DN/GTM process illegal termination)
  • Cluster software failure

Key Technology Five: High Availability-Cross-AZ/Region Disaster Recovery Technology

v2-14d08d7b25e5052356e8203bdfd5bebf_720w.jpg

GaussDB (for openGauss) currently supports single-cluster cross-AZ active-active in the same city, RPO=0, RTO<60s; dual-cluster cross-region, two-region, three-center disaster recovery, RPO<10s, RTO<10m, this solution is supporting cross-region capacity At the same time of disaster, it supports the minimization of disaster recovery nodes, which effectively reduces the cost of disaster recovery for users. At the same time, it allows users to expand the disaster recovery nodes online after they become the master in a failure scenario, ensuring uninterrupted business and improving the reliability of the user's original disaster recovery instance And availability.

Key Technology Six: High Scaling—Scale-out online horizontal scaling

v2-48918682c4596d94159fefdb090f6bb8_720w.jpg

GaussDB (for openGauss) single-cluster computing nodes support a maximum of 1000+, while having excellent linear scalability.

v2-00e1f084e4b9f653d5c943e5637d1a00_720w.jpg

Single-cluster sharding expansion supports automatic online redistribution of data, and supports PB-level massive transaction storage expansion capabilities.

In summary, GaussDB (for openGauss) has enterprise-level transaction mixed load capabilities, supports strong consistency of distributed transactions, cross-AZ deployment in the same city, zero data loss, supports 1000+ computing node expansion capabilities, and PB-level mass storage. At the same time, it has key capabilities such as high availability, high reliability, high security, elastic scaling, one-click deployment, rapid backup and recovery, monitoring and alarming, etc., which can provide enterprises with comprehensive, stable, reliable, scalable, and high-performance enterprise-level databases The service is currently open to commercial use across the entire network. And it is also an open ecological product. The source code of the single-slice version is open source. The community address is: https://opengauss.org. You are welcome to download, install and experience it yourself.

 Ps: Friends who missed the live broadcast of GaussDB (for openGauss) pay attention, click on the link to review directly, come and watch >> https://bbs.huaweicloud.com/live/cloud_live/202103161900.html

 This Share community since Huawei cloud "technology live Interpretation 1: Recognizing Huawei cloud database GaussDB (for openGauss)", the original author: effort fat.

 

Click to follow and learn about Huawei Cloud's fresh technology for the first time~

Guess you like

Origin blog.csdn.net/weixin_48967543/article/details/115216857