1. Overview of database services

1.1 Development trend of database

The scale of data is growing explosively, and the data application modes are constantly enriched. With the large-scale application of cloud computing, the traditional business model has changed.

1.2 Advantages of Cloud Database

Compared with traditional databases, cloud databases generally have the following advantages
- Ease of use: Cloud databases are generally provided as a cloud service. Like other cloud services, they can be quickly deployed and run, and generally eliminate the need for operation and maintenance.
- High scalability: Designed for the cloud environment, based on an open architecture and an environment where cloud computing and storage are separated, the scalability is stronger
- Low cost: Compared with traditional old-fashioned databases, it has lower software and hardware costs, and low total cost of ownership due to other features such as pay-per-use cloud services and on-demand anti-broadcasting.

1.3 Database classification: SQL & NoSQL

Relational database: A database that uses a relational model to organize data. The relational model refers to the two-dimensional table model, and a relational database is a data organization composed of two-dimensional tables and the connections between them.
Non-relational database: refers to a data storage system that is non-relational, distributed, and generally does not guarantee compliance with ACID principles.
Common products:
- Relational database: SQL Server, MySQL, PostgreSQL.
- Non-relational databases: Redis, Memcached, MongoDB.

1.4 HUAWEI CLOUD database panorama

Databases are divided into relational databases and non-relational databases:
Relational database: RDS for MySQL, RDS for PostgreSQL, RDS for SQL Server, GaussDB (for openGauss), GaussDB (for MySQL).
Non-relational databases: GaussDB ( for Mongo ), GaussDB ( for Cassandra ), GaussDB ( for Redis ), GaussDB ( for Influx ), DDS, DCS.
Database ecological services mainly include: DDM, DRS, UGO. Distributed Database Middleware DDM Distributed Database Middleware (DDM for short), focuses on solving the problem of distributed database expansion, breaks through the capacity and performance bottlenecks of traditional databases, and achieves high concurrent access to massive data.
DDM is a cloud-native distributed database middleware independently developed by HUAWEI CLOUD. It adopts a storage-computing separation architecture and provides capabilities such as sub-database and sub-table, read-write separation, and elastic expansion. It is stable, reliable, highly scalable, and can be continuously operated and maintained. . Server cluster management is completely transparent to users. Users can perform database operation and maintenance and data reading and writing through the DDM management console, providing an experience similar to traditional stand-alone databases.
Product advantages: automatic database and table division, read and write separation, elastic expansion.

2. Comparison and selection of database services on the cloud

2.1 Relational database service

2.1.1 SQL Design Principles on the Cloud

Scene introduction:
- Small system, peripheral applications: 100,000-level QPS, small OLTP, data volume tens to hundreds of GB.
- Enterprise-level applications: million-level QPS, medium-sized OLTP, data volume of TB~tens of TB.
- Core high-concurrency business system: ultra-large OLTP, mixed load, native distributed, more than ten TB.

2.1.1.1 ApsaraDB for RDS

safety:
- ApsaraDB for RDS instances run in virtual private clouds independent of tenants, which can improve the security of ApsaraDB for RDS instances. Users can comprehensively use the configuration of subnets and security groups to complete the isolation of ApsaraDB for RDS instances.
Access control:
- When creating an ApsaraDB for RDS instance, the ApsaraDB for RDS service will synchronously create a database master account for the tenant, create a database instance and database sub-account as required, and assign database objects to the database sub-account, thereby achieving the purpose of separation of permissions.
Transmission encryption:
- Use the CA root certificate downloaded from the service console and provide the certificate when connecting to the database to authenticate the database server and achieve the purpose of encrypted transmission.
Storage encryption:
- ApsaraDB for RDS service supports encrypted storage of data stored in the database, and the encryption key is managed by the KMS of the data zero encryption service.
Data deletion:
- Safe deletion includes not only the disk attached to the database instance, but also the storage space for automatic backup data. The deleted instance can restore the instance data through the retained manual backup, or use the instance within the retention period of the recycle bin to restore the data by rebuilding the instance.

2.1.1.2 Engines supported by RDS service

RDS for MySQL:
- The architecture is mature and stable, supports popular applications, and is suitable for multiple fields and industries; supports various WEB applications with low cost, and is the first choice for small and medium-sized enterprises.
- The management console provides comprehensive monitoring information, which is easy to use, flexible in management, visible and controllable.
- Flexible expansion and contraction of required resources according to business conditions at any time, on-demand spending, tailor-made.
RDS for PostgreSQl
- Support postgis plug-in, excellent spatial application.
- The application scenarios are rich and the cost is low. The required resources can be flexibly expanded according to the business situation at any time, and hw35802903 can be opened on demand and customized.
RDS for SQL Server:
- ApsaraDB for RDS for SQL Server has the characteristics of stability, reliability, safe operation, elastic scaling, easy management, and economical and practical features. It has a high-availability architecture, data security guarantee, and second-level recovery function for failures, providing a flexible backup solution

2.1.2 ApsaraDB for RDS for MySQL

Database type and version: MySQL 5.6, 5.7, 8.0.
Data Security: Multiple security policies protect database and user privacy.
High data reliability: database storage supports more than three copies,
The database data reliability is as high as 99.9999999% (9 9s), and the backup data reliability is as high as 99.999999999% (11 9s)
High availability of services (disaster recovery in the same city): The primary and backup instances support deployment within or across AZs, with high service availability of over 99.95%.
Instance access: multiple access methods, including: intranet IP access, public network IP access, VPN access
Instance management: Supports life cycle management such as adding, deleting, modifying, checking, and restarting instances.
Elastic scaling: horizontal scaling, adding and deleting read-only instances (up to 5), vertical scaling, changing instance specifications, and expanding storage space (up to 10 TB).
Backup and recovery: backup, automatic backup, manual backup, full backup, incremental backup, backup file addition, deletion, copying and other life cycle management. Recovery, recovery to any point in time within the backup retention period (Point-In-Time Recovery, referred to as PITR) / a full backup time point, recovery to the new instance / original instance. The backup retention period is up to 732 days.

2.1.2.1 Cross-AZ High Availability

When creating a database, users can choose the instance type in active/standby mode. When the main database fails, it will automatically switch to the standby database to continue to provide external services. When the standby database also fails, it will automatically access the primary and standby databases in another availability zone to provide external services.
Combined with DDM, RDS can support the creation of up to 5 read-only replicas. The master and backup complete writing data, and the read-only replica only completes reading data to achieve automatic traffic segmentation.
The active-standby mode provides VIP (virtual IP) to the outside world. When the VIP is bound to database 1, the database is the main database. When the main database fails, the VIP will float to the standby database, and the standby database will become a new one at this time. main library. The internal drift of the VIP can be completed in seconds, and the service is always provided externally. The user side is completely unaware.
Restrictions: Users can only create read-only replicas after purchasing the database.

2.1.2.2 Read and write separation

After creating a read-only replica, when the database provides external services, it first distinguishes the requests from the user side to determine whether the type of request is writing data or reading data. If it is to write data, the master and standby databases that route the request complete the data write operation. If it is read data, route the request to the read-only replica to complete the data read.

2.1.2.3 High data security

The number of days that supports custom configuration (that is, the backup retention period, the value is 0-732) to keep this automatic backup

2.1.2.4 Kernel Optimization

2.1.2.5 Case

2.1.3 ApsaraDB for RDS for PostgreSQL

Database: Provide support for 9.5/9.6/10.0/11/12 versions
Security: Multiple security policies protect database and user privacy.
Data Migration: Support cloud-on/off-cloud and cross-cloud online migration and offline migration
High availability: Copy the data of the primary database instance to a standby database instance. Once the primary database instance fails and becomes unavailable, it can be switched to the standby database instance in a short time
Monitoring: Supports monitoring of key performance indicators of database instances and database engines, including computing/memory/storage capacity usage, I/O activities, number of database connections, QPS/TPS, buffer pool, read/write activities, etc.
Elastic scaling: Horizontal scaling, adding and deleting read-only instances (up to 5 read-only instances per database cluster); vertical scaling, changing database instance specifications, one-click expansion, without interrupting business.
Backup and restore: backup, support automatic backup, manual data backup, restore, support restore to a backup file point.

2.1.3.1 High Reliability and High Availability

PG supports cross-AZ high availability. If the main library fails, it will start fault detection three times and pull it up. If it cannot be pulled up, it will automatically failover. The main library will switch to the standby library. It will be automatically linked to the new main library, and this switching is at the second level.
HUAWEI CLOUD database provides data backup and recovery capabilities. Users can set automatic backup policies and support daily automatic backups. The backup cycle can be up to 732 days. At the same time, incremental backups will be performed every five minutes to ensure data reliability.
If the data is abnormal or accidentally deleted, etc., it can support restoring the database to any previous point in time
The backup files are stored in OBS, and OBS itself has no capacity to go online, providing 11 out of 9 data reliability.

2.1.4 Database selection

2.1.5 Database comparison

Explanation: The similarity of colors represents the degree of conformity. Blue is used here, indicating that pg and mysql can be used in most scenarios.
- The degree of use of the database and the habit of architecture design, such as some game and Internet companies, only regard the database as a data storage tool5, which is a "light database and heavy application mode". In this case, both PG and MySQL can be used. But if Many functions depend on the characteristics of the database, so PG is recommended. Database middleware is basic software, stable and reliable, and open source databases are independently controllable. It is reasonable to choose to rely on such basic software
- Is the business purely transactional or mixed with transactional analysis? For the former, follow the company’s original habits. If it is the latter, it is recommended to use PG. PG’s analysis capabilities are very good.
- If you use a lot of stored procedures, it is recommended to use PG, otherwise follow the company's habits
- If there is a demand for heterogeneous database access, PG is recommended. PG provides Foreign DataWrapper, which enables users to access data outside PG through SQL.
- For the use of complex types, it is recommended to use PG. PG arrays, spatial data types, network data types JSON, XML, etc. are very mature and support customization
If you have requirements for geographic information, space, heterogeneous database access, machine learning, text retrieval, image, time series, multi-dimensional, word segmentation, etc., and do not want to introduce new specialized databases, then PG is recommended.

2.1.6 Case

2.1.7 ApsaraDB for GaussDB (for MySQL)

Shared DFV storage:
- Only one copy is stored. When adding a read-only node, you only need to add a computing node, no need to buy additional storage. If there are more read-only nodes, more storage costs can be saved.
Active-Active architecture:
- There is no longer a standby database, all read-only databases are in the Active state, and bear read traffic, making resource utilization higher.
Log as data architecture:
- It is no longer necessary to refresh the page, all update operations are only recorded in logs, and double write is no longer required. Valuable network bandwidth is reduced.

2.1.7.1 Parallel execution

32-core 256GB test TPCH query statement with 100G data volume, and the performance is improved by 8 times in the 16-thread concurrent scenario

2.1.7.2 Horizontal expansion

GaussDB (for MySQL) read and write performance linear expansion:
- When increasing the number of nodes, because the bottom layer uses DFV distributed storage, there is no need to re-partition storage for newly added nodes, and share the same as other nodes. block storage

2.1.7.3 Efficient backup

Compared with traditional database failure recovery, GaussDB (for MySQL) will gradually restore some data and provide external services when restoring data until all data is restored. However, traditional databases provide external services only after they are all restored.

2.1.7.4 Case

2.1.8 ApsaraDB for GaussDB (for openGauss)

High security:
- GaussDB (for openGauss ) has TOP-level commercial database security features: data dynamic desensitization TDE transparent encryption, row-level access control, and encrypted computing. It can meet the core security requirements of government, enterprise and financial customers.
Sound tools and service capabilities:
- GaussDB (for openGauss) already has HUAWEI CLOUD, the commercial service deployment capability of HUAWEI CLOUD Stack, and has ecological tools such as DAS, wUGO, and DRS. Effectively guarantee the daily work needs of users such as development, operation and maintenance, optimization, monitoring, migration, etc.
Full-stack self-developed:
- GaussDB (for openGauss) is based on the Peng ecology and is currently the only domestic brand that can achieve full-stack independent controllability. At the same time, GaussDB (for openGauss) can continuously optimize the bottom layer based on hardware advantages to improve the overall performance of the product.
Open source ecology:
- GaussDB (for openGauss) already supports the open source community and provides downloads of the main and backup versions.

2.1.8.1 Key roles

etcd: Consistency Components
CMS: for cluster management, active and standby switching control, high availability related

2.1.8.2 High Performance - Distributed Parallel Execution Framework

2.1.9 High Performance - Distributed Transaction Processing Performance, GTM-Lite Technology

2.1.9.1 Case: GaussDB helps smart business operations

2.2 Non-relational database services

2.2.1 NoSQL design principles on the cloud

2.2.2 Document Database Service DDS

Database type and version: Compatible with MongoDB 4.0/4.2 version.
Data Security: Multiple security policies protect database and user privacy. High data reliability: The database storage supports more than three copies, the database data reliability is high, reaching 99.9999999% (9 nines), and the backup data persistence is as high as 99.9999999999% (12 nines)
High availability of services (disaster recovery in the same city): cluster/replica set instances support deployment within an AZ or across 3AZs, and the service availability is over 99.95%.
Instance monitoring: Supports monitoring key performance indicators of database instance OS and DB engine, including computing/memory/storage capacity utilization, I/O activities, database connections, etc.
Elastic scaling: horizontal scaling, adding and deleting shard fragments (up to 32), supporting 7-node replica sets, and supporting read-only nodes; vertical scaling, changing instance specifications, and expanding storage space (maximum 32*2TB).
Backup and recovery: backup, automatic backup, manual backup, full backup, incremental backup, backup file addition, deletion, copy and other life cycle management; recovery, support for recovery to any point in the backup retention period (Point-ln -Time Recovery, PITR)/a full backup time point, restore to the new instance/original instance. The backup retention period is up to 732 days.

2.2.3 High reliability - online expansion without interruption, three copies of storage

2.2.4 High reliability - data archiving, backup and recovery

2.2.5 Cross-Availability Zone Backup - Backup is replicated across Regions and can be restored in different places

2.2.6 Cross-Availability Zone Disaster Recovery - Cross-Region real-time disaster recovery and real-time data synchronization

2.2.7 Case: Helping the game industry

game scene
In game applications, some user information, such as user equipment and user points, can be stored in the DDS database during peak game player activity, which requires high concurrency capabilities. DDS cluster types can be used to deal with high concurrency scenarios. The high-availability features of DDS replica sets and cluster architecture can meet the continuous and stable operation of games in high-concurrency scenarios.
In addition, DDS is compatible with MongoDB and has a No-Schema method, which can avoid the pain of changing the table structure during game play changes, and is very suitable for flexible and changeable game business needs. Users can store structured data with fixed schema in cloud database RDS, store business with flexible schema in DDS, and store high-heat data in GaussDB (for Redis), so as to realize efficient access to business data and reduce the input cost of storing data .

2.2.8 ApsaraDB for GaussDB (for NoSQL)

Compatible Cassandra interface:
- Support wide column data model
- Super write performance, suitable for IoT, financial anti-fraud detection and other scenarios
Compatible with MongoDB interface:
- Supports the document data model.
- It has excellent advantages in read and write performance, sensitivity and reliability.
Compatible with Redis interface:
- The first Redis database product on the cloud that separates computing and storage.
- It has outstanding advantages in terms of data reliability, scalability, and cost performance.
Compatible with InfluxDB interface.
- Cluster architecture and data layout designed for time series data
- High write performance and high compression ratio.

2.2.9 ApsaraDB for GaussDB (for Redis)

GaussDB (for Redis) has the characteristics of high cost performance, elastic scaling, and hot and cold separation.
Cost-effective:
- Based on shared storage, on the premise of providing sufficient performance, the cost of using Redis for massive data is greatly reduced
- All the data is stored in the disk, and the hot and cold separation is realized, the problem of interactive access between the cache (cache) and the database (DB) is solved, and the readability and operating efficiency of the program are improved.
Lossless elastic scaling:
- RocksDB in-depth customization, second-level split elastic expansion.
- Scaling and shrinking is fast and smooth without relocating data.
- Through the proxy agent, the upper-layer business can not perceive the kernel to handle the data migration during the expansion and contraction process.
Hot and cold separation:
- Hot data is resident in memory, and cold data is stored persistently in full, replacing the hot and cold separation architecture of Redis+MySQL. 0 realizes the automatic exchange of hot and cold data, users do not need to exchange data manually, and the code development is more concise

2.2.10 Case: Helping the Energy Industry

While GaussDB (for Redis) is compatible with the Redis interface, it also provides large-capacity, low-cost, and highly reliable data storage capabilities, which can be used as an ideal solution for such persistent storage scenarios.

2.2.11 Cloud database GaussDB (for Mongo)

2.2.12 Case: Helping the game industry

game scene.
- Compatible with the MongoDB protocol, the game application can store some game data, such as user equipment, user points, etc. in it. During the peak period of game player activity, the requirements for concurrency are high, and computing nodes can be quickly and flexibly added to cope with high concurrency games

3. Database Migration Plan

HUAWEI CLOUD database migration overall solution

Database Migration Method
The database migration is usually implemented in the form of UGO+DRS combination. When a user migrates a database to HUAWEI CLOUD from off-cloud or other cloud vendors, first use the UGO tool to analyze the source database, and start migrating the database based on the actual scenario and referring to the solution provided by the UGO tool. The data replication service DRS realizes the migration of data from the source database to the target database through the technology of full data + incremental migration.

3.1 Data Replication Service DRS

、

Easy to operate:
- In traditional scenarios, a professional technical background is required, the steps are complicated, and the technical threshold is relatively high
Short cycle
- In traditional scenarios, manual deployment is required, ranging from a few days to last week or last month
low cost:
- In the traditional scenario, the investment is high, and the business cannot be paid flexibly on demand
low risk
- In traditional scenarios, business interruption and manual migration are required, and there is a risk of data loss if migration fails

3.1.1 Live Migration

Real-time migration supports multiple network migration methods, such as: public network, VPC network, VPN network, and private network through various network links, which can quickly realize cross-cloud platform database migration, off-cloud database migration to cloud or cloud cross-region Migration of various business scenarios such as database migration

3.1.2 Backup Migration

3.1.3 Real-time synchronization

3.1.4 Data Subscription

3.1.5 Real-time disaster recovery

3.1.6 Case: Helping the Automobile Industry

3.2 Database Migration Tool UGO

The service is currently in the commercial stage and only developed in South China-Guangzhou and Asia-Pacific-Singapore regions

3.2.1 Source image

Image of source library of UGO core competence:
- The source library profile takes massive business scenarios as samples and uses key database indicators as characteristics to conduct training, abstracts the whole picture of database information, and provides further accurate and rapid analysis of important information such as source library application scenarios and user operating habits nw3580 Sufficient data base.

3.2.2 Target type selection and specifications

Target selection and specifications of UGO core capabilities:
- According to the image input of the source library, comprehensive compatibility, performance, object complexity, usage scenarios, etc., intelligently recommend the appropriate target library type selection and priority yo, as well as the specifications and costs under different selections.

3.2.3 Compatibility Analysis

Compatibility Analysis of UGO Core Competence
- Taking the image of the source library as input, the conversion rate of the UGO kernel to the target library is used to perform compatibility analysis on 14 core object types. The compatibility analysis includes native support, UGO support and non-support. Through the continuous construction of the kernel in the past few years, UGO can achieve a high grammar conversion rate on the basis of training with hundreds of millions of samples

3.2.4 Workload Assessment

Workload assessment of UGO core competencies:
- Based on the actual human migration costs in massive business scenarios, as the evaluation baseline, based on the automated migration process of a large number of business scenarios, the cumulative migration workload is used as input, combined with the code volume, conversion rate, and the difficulty of incompatible feature transformation, etc., to comprehensively output the migration workload evaluation .

3.2.5 Database structure migration

Database structure migration of UGO core competence:
- Structural migration takes pre-migration evaluation as input and program guidance, and supports users to customize and filter migration objects before conversion. After conversion, mark conversion failed objects and failure reasons. Users can correct objects according to failure reasons, and perform verification tests after correction. Objects that fail the verification go back to the correction step to be re-modified, and continue to submit for verification until all objects are verified successfully, and the entire migration implementation process ends.

HCIP Study Notes-Database Service Planning-5

1. Overview of database services

1.1 Development trend of database

1.2 Advantages of Cloud Database

1.3 Database classification: SQL & NoSQL

1.4 HUAWEI CLOUD database panorama

2. Comparison and selection of database services on the cloud

2.1 Relational database service

2.1.1 SQL Design Principles on the Cloud

2.1.1.1 ApsaraDB for RDS

2.1.1.2 Engines supported by RDS service

2.1.2 ApsaraDB for RDS for MySQL

2.1.2.1 Cross-AZ High Availability

2.1.2.2 Read and write separation

2.1.2.3 High data security

2.1.2.4 Kernel Optimization

2.1.2.5 Case

2.1.3 ApsaraDB for RDS for PostgreSQL

2.1.3.1 High Reliability and High Availability

2.1.4 Database selection

2.1.5 Database comparison

2.1.6 Case

2.1.7 ApsaraDB for GaussDB (for MySQL)

2.1.7.1 Parallel execution

2.1.7.2 Horizontal expansion

2.1.7.3 Efficient backup

2.1.7.4 Case

2.1.8 ApsaraDB for GaussDB (for openGauss)

2.1.8.1 Key roles

2.1.8.2 High Performance - Distributed Parallel Execution Framework

2.1.9 High Performance - Distributed Transaction Processing Performance, GTM-Lite Technology

2.1.9.1 Case: GaussDB helps smart business operations

2.2 Non-relational database services

2.2.1 NoSQL design principles on the cloud

2.2.2 Document Database Service DDS

2.2.3 High reliability - online expansion without interruption, three copies of storage

2.2.4 High reliability - data archiving, backup and recovery

2.2.5 Cross-Availability Zone Backup - Backup is replicated across Regions and can be restored in different places

2.2.6 Cross-Availability Zone Disaster Recovery - Cross-Region real-time disaster recovery and real-time data synchronization

2.2.7 Case: Helping the game industry

2.2.8 ApsaraDB for GaussDB (for NoSQL)

2.2.9 ApsaraDB for GaussDB (for Redis)

2.2.10 Case: Helping the Energy Industry

2.2.11 Cloud database GaussDB (for Mongo)

2.2.12 Case: Helping the game industry

3. Database Migration Plan

3.1 Data Replication Service DRS

3.1.1 Live Migration

3.1.2 Backup Migration

3.1.3 Real-time synchronization

3.1.4 Data Subscription

3.1.5 Real-time disaster recovery

3.1.6 Case: Helping the Automobile Industry

3.2 Database Migration Tool UGO

3.2.1 Source image

3.2.2 Target type selection and specifications

3.2.3 Compatibility Analysis

3.2.4 Workload Assessment

3.2.5 Database structure migration

3.2.6 Applying SQL migrations

3.2.7 Case: Helping the Communication Industry

thinking questions

end flowering

Guess you like