What exactly should be designed for database software architecture?

1. Basic concepts

Concept 1: Single library

What exactly should be designed for database software architecture?

Concept 2: Fragmentation

What exactly should be designed for database software architecture?
Sharding solves the problem of "too much data", which is commonly referred to as "horizontal segmentation".

Once sharding is introduced, it is bound to face the new problem of "data routing", which database should be accessed by the data. There are usually 3 methods for routing rules:

(1) Range: range

Advantages: simple and easy to expand.
Disadvantages: The pressure of each library is uneven (the new number segment is more active).

(2) Hash: hash

Advantages: simple, balanced data, even load.
Disadvantages: Migration is troublesome (data from 2 databases to 3 databases needs to be migrated).

(3) Unified routing service: router-config-server

Advantages: strong flexibility, decoupling services and routing algorithms.
Disadvantages: one more query before each access to the database.

The second solution adopted by most Internet companies: Hash routing.

Concept 3: Grouping

What exactly should be designed for database software architecture?
Grouping solves the problem of "availability and performance improvement", and grouping is usually implemented through master-slave replication.

The actual software architecture of the Internet company database is "sharded and grouped":
What exactly should be designed for database software architecture?

What is the design of database software architecture? At least the following four points must be considered:

  • How to ensure data availability
  • How to improve database read performance (most applications read more and write less, read will become the bottleneck first)
  • How to ensure consistency
  • How to improve scalability

2. How to ensure the availability of data?

The idea to solve the availability problem is: redundancy.

How to ensure the availability of the site? Redundant site.
How to ensure service availability? Redundant services.
How to ensure the availability of data? Redundant data.

The redundancy of data will bring a side effect: consistency problems.

How to ensure high availability of database "read"?

Redundant read library.
What exactly should be designed for database software architecture?

What are the side effects of redundant reading libraries?
There is a delay in reading and writing, and the data may be inconsistent.
The above picture is the architecture of many Internet companies mysql, writing is still a single point, and high availability of writing cannot be guaranteed.

How to ensure the "write" high availability of the database?

Redundant write library.
What exactly should be designed for database software architecture?
Using dual master and mutual backup mode, the library can be written redundantly.

What are the side effects of redundant writing libraries?
Double write synchronization, data may conflict (such as "increment id" synchronization conflict).

There are two common solutions to how to resolve synchronization conflicts:
(1) The two writing libraries use different initial values ​​and the same step size to increase the id: 1 The id of the writing library is 0, 2, 4, 6...; 2 The id of the writing library is 1, 3, 5, 7...;
(2) The id of the data is not used, and the business layer generates its own unique id to ensure that the data does not conflict;

Alibaba Cloud's RDS service is called high availability, how is it achieved?
What they adopted is similar to the "dual master synchronization" approach (there is no longer a slave library).
What exactly should be designed for database software architecture?
It is still dual-master, but only one master provides read and write services, and the other master is "shadow-master", which is only used to ensure high availability and does not usually provide services.

The master is hung up, and the shadow-master is on top, and the virtual IP drifts, which is transparent to the business layer and does not require manual intervention.

The advantages of this method:
(1) There is no delay in reading and writing, and there is no consistency problem;
(2) High availability of reading and writing;

The disadvantages are:
(1) The read performance cannot be expanded by adding slave libraries;
(2) The resource utilization rate is 50%, and a redundant master does not provide services;
voice-over: Therefore, high-availability RDS is quite expensive.

3. How to expand read performance?

There are roughly three ways to improve read performance. The first is to increase the index.

This method does not expand. The point to be mentioned is that different libraries can create different indexes.
What exactly should be designed for database software architecture?
As shown in the figure above:
(1) Write library does not create index;
(2) Online read library creates online access index, such as uid;
(3) Offline read library creates offline access index, such as time;

The second way to expand read performance is to increase the slave library.

This method is used more frequently and has two disadvantages:
(1) The more slave libraries, the slower the synchronization;
(2) The slower the synchronization, the larger the data inconsistency window;

The third way to increase system read performance is to increase the cache.

Common cache architectures are as follows:
What exactly should be designed for database software architecture?
(1) Upstream is business application;
(2) Downstream is main library, slave library (read-write separation), cache;

If the system architecture is service-oriented:
(1) Upstream is business application;
(2) Middle is service;
(3) Downstream is main library, slave library, and cache;
What exactly should be designed for database software architecture?
business layer does not directly face db and cache, service layer shields the bottom The complexity of db and cache.

Regardless of whether the master-slave approach is used to expand the read performance or the cache approach is used to expand the read performance, the data must be replicated in multiple copies (master+slave, db+cache), which will definitely cause consistency problems.

4. How to ensure consistency?

There are usually two solutions for the consistency of the master-slave database:

(1) Middleware

What exactly should be designed for database software architecture?
If a key has a write operation, the middleware will also route the read operation of this key to the main library within the inconsistent time window.

(2) Mandatory read master

What exactly should be designed for database software architecture?
The "dual-master high-availability" architecture can greatly alleviate the problem of master-slave consistency.

The second type of inconsistency is the inconsistency between db and cache.
What exactly should be designed for database software architecture?
This kind of inconsistency, "Cache Architecture, Is One Enough? "There is a very detailed description, this article will not expand.

In addition, it is recommended that all business scenarios that allow cache miss have a timeout period for the KEY in the cache, so that even if there are inconsistencies, there is a chance of self-repair.

5. How to ensure the scalability of the database?

Double database expansion in seconds:
"Smooth expansion of 100 million data DB in seconds"

If not for double expansion:
"10 billion data smooth data migration without affecting service"

It is also possible to expand the field:
"10,000 attributes, 10 billion data, architecture design?

These programs have been written in related articles, so I won't repeat them in this article.

What exactly should be designed for database software architecture?

  • Availability
  • Read performance
  • consistency
  • Scalability

Hope it will be helpful for everyone to understand the database software architecture systematically.
What exactly should be designed for database software architecture?
Architect's Road-Sharing Landable Technologies

related suggestion:

"Back to table query? Index coverage? | 1 Minute Series "Newly released
"Optimization tool, rapid positioning of inefficient SQL | 1 Minute Series"
"Database allows null values ​​(null) is the beginning of tragedy | 1 Minute Series"
"Two types of very hidden full table scan | One minute series"

Guess you like

Origin blog.51cto.com/jyjstack/2548567