Abstract: This article introduces GaussDB (for openGauss) from the overall architecture, main scenarios, and key technical features.
1. Background introduction
Online shopping www.cqfenfa.comOn March 16, during the first live broadcast of the GaussDB (for openGauss) series of technologies hosted by Huawei Cloud, "Understanding Huawei Cloud Database GaussDB (for openGauss)", there was a question: open source databases are so fragrant, why does Huawei download Kung Fu self-developed GaussDB (for openGauss)?
In fact, many open source databases are weak in ease of use and supporting capabilities, and require constant maintenance. Moreover, once data loss is encountered, it is difficult to recover quickly, and the loss caused is immeasurable. Therefore, the open source database cloud can only solve the requirements of small and medium-sized enterprises such as simplified deployment, operation and maintenance, optimization, and extreme cost-effectiveness.
At the same time, open source databases have to face various large and small cost expenditures such as servers, database maintenance and upgrades, and human operation and maintenance. It is difficult to meet the rapid expansion and sustainable development of the business. Facing large enterprises such as finance, government and enterprises that have stringent requirements for data security, response speed, reliability, and availability, they need ultra-high availability, complete functions, excellent performance, open ecology, and extremely flexible enterprise-level database services.
GaussDB (for openGauss) is an enterprise-level distributed relational database based on the openGauss self-developed ecosystem based on Huawei's deep integration of many years of database field experience, fully combining the needs of enterprise-level scenarios. Currently, it supports both single-shard and distributed deployment forms. On the basis of supporting traditional services, it continues to build competitive features, which provides unlimited possibilities for enterprises to face the challenges of the 5G era.
In order to let everyone quickly understand GaussDB (for openGauss), the Huawei Cloud database team prepared the GaussDB (for openGauss) series of technical live broadcasts. This article will introduce the overall architecture, main scenes, and key technical features of the first live broadcast.
2. Overall architecture: unified distributed architecture based on data sharding
GaussDB (for openGauss) unifies the distributed architecture based on data sharding (share nothing). The underlying data is scattered and distributed to different data nodes through certain rules such as hash, list, or range, and multiple nodes at the bottom are calculated. Participate in the calculation together. At the same time, the data node can be expanded, and the upper layer is SQL parsing and forwarding by the coordinating node.
As you can see from the figure, it mainly includes three types of nodes: coordination nodes, data nodes, and cluster nodes (the most important is the global transaction manager). The coordinating node is responsible for SQL parsing and forwarding, acting like a proxy, the data node is responsible for computing and data storage, and the global transaction manager is responsible for ensuring the read consistency of the global transaction.
This architecture builds the following core advantages for GaussDB (for openGauss):
- Extremely high availability: Two-region three-center architecture, real-time disaster recovery across regions
- Data security: Achieve strong data consistency across AZ deployments and ensure zero data loss
- High scalability: containerized deployment, performance and capacity can be expanded horizontally on demand, up to 1000+ nodes
- Strong performance: Kunpeng 2-way server, 32 nodes 12 million tpmC (Huawei internal test)
- Full stack software and hardware self-developed and controllable: industry-leading Kunpeng + openGauss self-developed open kernel
3. Main scene
Scenario 1: Traditional core transaction
For traditional applications, a single-slice mode can be used, which is the same as the traditional active-standby mode. GaussDB (for openGauss) combined with Kunpeng's in-depth optimization, the performance is very outstanding, and the usability has been greatly improved, which is very suitable for traditional commercial database replacement scenarios.
Scenario 2: Future massive transaction type
With the advent of the 5G era, it is difficult for a single node to cope with the ever-increasing data scale and the need to ensure performance, while a cross-node, horizontally scalable database can well solve the need for large-scale and massive data computing and storage. The GaussDB (for openGauss) distributed model can support up to 1000+ nodes, PB-level storage, and strong consistency of distributed transactions, which can well meet the Internet+ requirements of the government, transportation, finance, energy and other industries.
Key role
In order to facilitate everyone to better understand the technical operation status of GaussDB (for openGauss), some key roles of GaussDB (for openGauss) will be introduced below:
4. Key technical characteristics
GaussDB (for openGauss) is based on a distributed architecture with separation of computing and storage, and has built 6 core technical features. The following will explain these 6 features in detail.
Key Technology One: High Performance—Distributed Execution Framework
The general implementation process of this feature is:
- Business applications issue SQL to Coordinator, SQL can include CRUD operations on data;
- Coordinator uses the optimizer of the database to generate an execution plan, and each DN will process the data according to the requirements of the execution plan;
- Data is distributed in each DN based on a consistent Hash algorithm. Therefore, DN may need to obtain data from other DNs in the process of data processing. GaussDB provides three streams (broadcast stream, aggregation stream and redistribution stream) to realize data in DN. Flow between
- DN returns the result set to Coordinate for summary;
- The Coordinator returns the aggregated results to the business application.
Huawei has many years of experience in SQL execution optimization. Even complex SQL and transaction analysis (HTAP) scenarios can be optimally executed, for example:
- Cost-based optimization
- Base estimation: Feedback enhancement, AI base enhancement
- Cost estimation: row storage/column storage cost estimation, network communication cost estimation
- Search algorithm: dynamic programming method, genetic algorithm, AI search
- Distributed execution plan capability
- Light Proxy
- Fast Query Shipping
- Remote Query Shipping
- Self-developed Cascade optimizer
- Object processing rule application and search task
- Pruning technology based on branch and bound
Through the distributed query engine, distributed scheduling engine, and distributed storage engine, GaussDB (for openGauss) perfectly achieves the automatic sharding of data, and uses the query optimizer to continuously improve the processing efficiency of the execution plan while automatic load balancing; The node provides sub-scene stream (broadcast stream, aggregation stream, redistribution stream) for different data scenarios, continuously improving the interaction efficiency between multi-sharded data nodes, and automatically completes the summary of data results to ensure the global consistency of distributed transactions .
Key technology two: high performance—distributed transaction processing performance, GTM-Lite technology
The advantages of this feature are:
- High-performance transaction management: Supports lock-free, multi-version, and high-concurrency transaction technology.
- Distributed strong consistency: The distributed GTM-Lite solution provides global transaction snapshots and commit number management to achieve strong consistency and no central node performance bottleneck.
Key technology 3: High performance—Scale-up capability, breaking through Kunpeng 4P NUMA-Aware new architecture, realizing a breakthrough in 4P server performance
GaussDB (for openGauss) uses Numa Aware technology to optimize series based on the characteristics of the Kunpeng processor's multi-core NUMA architecture, and uses core-binding technology to avoid cross-core access to memory and reduce latency; through the application of redo log batch insertion, hot data NUMA distribution, Clog partitioning and other key technologies give full play to the advantages of multi-core computing power, and continuously reduce access delays, log write conflicts, and index update conflicts. Currently based on the Taishan Kunpeng server, the TPCC performance pressure test is 1.5 times that of the x86 of the same specification.
Key technology four: high availability-cluster HA, multi-level redundancy to achieve no single point of failure in the system
GaussDB (for openGauss) achieves no single point of failure of the entire system software and hardware through hardware redundancy, instance redundancy, and data redundancy. Different from traditional database software products, GaussDB (for openGauss) mainly focuses on providing high availability and high reliability guarantee through software capabilities. Based on the software and hardware base, Huawei Cloud achieves high availability of end-to-end databases, and supports end-to-end monitoring and detection of the entire scene, which can more timely and reliably ensure that users' applications are online, zero data is lost, and a full stack is realized. No single point of failure.
Highly available technology points
High hardware availability:
- Storage: Disk RAID redundancy.
- Network: Dual switch redundancy.
- Network card: multiple network card redundancy.
- Host: UPS power protection
Software high availability:
- Coordinating node CN instance multiple active redundancy
- Data node / global transaction management / cluster manager instance Active-Standby redundancy
Troubleshooting
- Network fault detection and processing (switch router, etc.)
- Network card failure detection and processing (local network card failure detection)
- Disk failure detection and processing: disk heartbeat, processing error codes returned by the file system
- Host power failure detection and processing: Heartbeat mechanism
- Cluster instance failure detection and processing (CN/DN/GTM process illegal termination)
- Cluster software failure
Key Technology Five: High Availability-Cross-AZ/Region Disaster Recovery Technology
GaussDB (for openGauss) currently supports single-cluster cross-AZ active-active in the same city, RPO=0, RTO<60s; dual-cluster cross-region, two-region, three-center disaster recovery, RPO<10s, RTO<10m, this solution is supporting cross-region capacity At the same time of disaster, it supports the minimization of disaster recovery nodes, which effectively reduces the cost of disaster recovery for users. At the same time, it allows users to expand the disaster recovery nodes online after they become the master in a failure scenario, ensuring uninterrupted business and improving the reliability of the user's original disaster recovery instance And availability.
Key Technology Six: High Scaling—Scale-out online horizontal scaling
GaussDB (for openGauss) single-cluster computing nodes support a maximum of 1000+, while having excellent linear scalability.
Single-cluster sharding expansion supports automatic online redistribution of data, and supports PB-level massive transaction storage expansion capabilities.
In summary, GaussDB (for openGauss) has enterprise-level transaction mixed load capabilities, supports strong consistency of distributed transactions, cross-AZ deployment in the same city, zero data loss, supports 1000+ computing node expansion capabilities, and PB-level mass storage. At the same time, it has key capabilities such as high availability, high reliability, high security, elastic scaling, one-click deployment, rapid backup and recovery, monitoring and alarming, etc., which can provide enterprises with comprehensive, stable, reliable, scalable, and high-performance enterprise-level databases The service is currently open to commercial use across the entire network. And it is also an open ecological product. The source code of the single-slice version is open source. The community address is: https://opengauss.org. You are welcome to download, install and experience it yourself.
Ps: Friends who missed the live broadcast of GaussDB (for openGauss) pay attention, click on the link to review directly, come and watch >> https://bbs.huaweicloud.com/live/cloud_live/202103161900.html
This Share community since Huawei cloud "technology live Interpretation 1: Recognizing Huawei cloud database GaussDB (for openGauss)", the original author: effort fat.
Click to follow and learn about Huawei Cloud's fresh technology for the first time~