The difference between distributed database and centralized database

Chapter 1: Differences between distributed databases and centralized databases

1. The database is the core IT infrastructure

insert image description here
• The growth of Internet business drives the upgrade of the core system
• The core system introduces database distribution and cloud transformation to support horizontal and smooth expansion

insert image description here
• The large-scale promotion of 5G will drive the upgrade of IT systems
• 5G has the capabilities of large bandwidth and ultra-low latency, which requires the database system to improve response speed and concurrency

insert image description here
• Build a smart government
• Realize the "Internet +" business construction with the goal of a smart government, which puts forward higher requirements for the performance and expansion of the database

2. Challenges faced by traditional centralized databases

2.1 Traditional database architecture

insert image description here

2.2 Advantages

• Mature and stable: after nearly 40 years of development, it has been applied to various industries, and the product technology is very mature and stable
• Strong industry adaptability: adapting to various needs of different industries
• Perfect ecology: it has a large number of ISV application developers and technologies Developers, technology ecology, industry ecology and talent ecology are all complete

2.3 Disadvantages

High cost: the price of its own software is high, and relying on high-end hardware, the cost of CAPEX and OPEX is high and
cannot be expanded horizontally: the increase in capacity can only be achieved by improving the performance of the device itself (increasing CPU/memory/hard disk, or upgrading from a PC server to a small server) machine, etc.), must be able to touch the upper limit of a single point

3. There are still shortcomings in the sub-database and sub-table scheme using database middleware

insert image description here
• Using a general-purpose database can achieve linear expansion of the database;
• The database is a single-point database, there is no connection between the databases, and the existence of other databases is unknown, and the middleware is used to complete transactions that require cross-databases; • The
database middleware connects various databases , Realize sub-database sub-table.

3.1 Advantages

Linear expansion: through sub-database and sub-table, the horizontal expansion of the database can be quickly realized
Low technical cost: no need to modify the core database engine, or only need to do little modification

3.2 Disadvantages

Cross-database distributed transactions: The core engine of the database does not have distributed capabilities, and can only complete distributed processing through middleware, but it is difficult for middleware to achieve RPO=0, so distributed transactions cannot be guaranteed 100% when encountering exceptions and failures ACID capability
Global consistency: Since the timestamps of multiple database servers are inconsistent, it is difficult to ensure the global consistency of data version numbers among multiple databases. Load
balancing: When expanding and shrinking, the underlying database engine cannot adjust data distribution online Therefore, it is necessary to suspend the business and re-direct the data, which is a great challenge to the business and operation and maintenance.
Cross-database complex SQL: cross-database complex SQL operations (such as multi-table sharding key-independent association queries) can only be completed in middleware. However, middleware does not have distributed parallel computing capabilities, which will eventually limit the use of SQL by applications, resulting in business intrusion

4. Native distributed relational database architecture

insert image description here

4.1 Advantages

High data reliability + high service availability: industrial-grade implementation of the multi-copy consensus protocol Paxos, guaranteeing zero data loss (RPO=0) and fast service recovery (RTO<30 seconds) when individual nodes fail. Linear expansion: with business
volume Increase capacity expansion (such as during online promotions), and shrink capacity as business volume decreases (such as after promotions)
Low cost: Based on ordinary X86 servers to ensure high availability, no need to use high-end minicomputers and storage
Global consistency: Support distributed transactions , to ensure global consistency, and support distributed complex queries. Flexible deployment methods: support three-center, five-center, active-standby and other deployment modes
Transparent to business: business systems can use distributed databases like single-point databases, and business migration low retrofit cost

5. Comparison between OceanBase and traditional databases

traditional centralized database Distributed database represented by OceanBase
product architecture The classic "single-point centralized" architecture adopts the "Share-Everything" architecture. Built on the basis of high-end hardware, such as IBM high-end servers and EMC high-end storage devices, etc. The native "distributed" database adopts the most stringent Paxos distributed consensus protocol in the industry. It is designed based on ordinary PC hardware and does not require high-end hardware.
Data Reliability and Service High Availability Use high-end hardware equipment to ensure data reliability and adopt "master-slave replication". If the master node fails, there will be data loss (RPO>0); the service cannot be restored automatically, and the service recovery time (RTO) is usually calculated in hours Based on ordinary PC hardware, the Paxos distributed consensus protocol is used to ensure data reliability
In the event of a master node failure, Paxos can guarantee data loss (that is, RPO=0), and automatically elect and restore services, and the service recovery time (RTO) is within 30 seconds
Scalability Data storage can only achieve vertical expansion within a single point, and eventually it will inevitably reach the upper limit of capacity under the single point architecture. Compute nodes are generally not scalable. Computing nodes can be expanded in a few modes (such as RAC, pureScale), but multiple computing nodes still need to access single-point shared storage, and the number of scalable computing nodes is limited Both data nodes and computing nodes can be expanded horizontally under the MPP architecture. There is no limit to the number of data nodes and computing nodes. On the premise of sufficient network bandwidth, they can be expanded to any number
Application Scenario Concentrated on the core system of enterprise customers (finance, telecommunications, government and enterprise, etc.), unable to cope with Internet business scenarios, and few application cases Alipay core, MYbank core, Alibaba's many businesses, and a number of external commercial banks. Gradually move towards traditional business
The cost More expensive, you need to pay for high-end basic hardware, high software licensing fees, and product service fees Relatively low, the design based on PC hardware reduces hardware costs, software licensing costs and service costs are also advantageous

6. Summary

After nearly 40 years of development, the traditional centralized database has become very mature. However, in the current era of big data, traditional databases still face many challenges. Distributed databases can effectively solve these problems, which is the key direction of future database development. 1: Traditional databases often have
high requirements for hardware infrastructure, and can only Vertical expansion, unable to expand horizontally, and easy to reach the upper limit of performance;
2: Although sub-database and sub-table can be expanded horizontally, it also brings new problems such as ACID that does not support complex SQL and is difficult to guarantee distributed transactions; 3:
Distributed Databases can effectively solve these problems. Applications can use distributed databases like centralized databases. Distributed databases have the characteristics of low hardware cost, high scalability, and high availability.

Guess you like

Origin blog.csdn.net/Redamancy06/article/details/128025698