Distributed architecture-design analysis of data access layer

Article Directory

(1) The challenge and response of the database from a single machine to a distributed

(1) Database decompression solution

1: Upgrade stand-alone hardware, vertical expansion
2: Optimize application level, reduce database access pressure
3: Introduce cache, add data search engine, reduce database reading pressure
4: Database data and access are divided among multiple databases, and expand horizontally

(2) Analysis of horizontal and vertical data division

Definition:
Vertical division: divide the data of different business units in a database into different databases. According to business division, there is a certain data independence, such as user database, order database
horizontal division: the same business unit is divided according to certain rules The data is split into multiple databases. According to rules, the union of horizontal divisions constitutes a sum of business data

Analysis of Vertical Partition

1. The ACID guarantee of the stand-alone machine is broken. After the data arrives in multiple machines, the processing logic that was originally carried out on a single machine through transactions will be greatly affected. The choice we face is to either abandon the original single-machine transaction, modify the implementation, or introduce distributed transactions.

2. Some Join operations will become more difficult, because the data may already be in two databases, so the database's own Join cannot be used easily, and applications or other methods are needed to solve it. ·Scenarios that rely on foreign keys to constrain will be affected.

Analysis of level division problem
1: ACID may also be broken.
2: It is also possible that the Join operation is affected.
3: Relying on foreign keys to constrain the scene will have an impact.
4: The unique ID generated by the auto-increment sequence of a single library will be affected.
5: The query for a single logical table needs to cross-database.

(2) Distributed transaction

(1) Distributed transaction model and specification

1: The X/OPen organization, that is, The Open Group, proposes a distributed transaction specification-XA
2: The X/Open organization defines a distributed transaction processing model-X/OPen DTP model
3: Three components in the DTP model: AP, RM, TM

Transaction: A transaction is a complete unit of work, composed of multiple independent computing tasks, which are logically atomic.

Global transaction: A transaction that operates multiple RM resource managers at one time is a global transaction

Branch transaction: In the global transaction, each resource manager has its own independent task, and the collection of these tasks is the branch task of the resource manager

Control thread: used to represent a worker thread, mainly the thread associated with AP, TM, and RM, that is, the transaction context. That is, the thread used to identify the relationship between the global transaction and the branch transaction.

AP: Application Program (AP), which is an application program, which can be understood as a program that uses the DTP model. It defines the transaction boundary and defines the specific operations of the application that constitutes the transaction.

RM: Resource Manager (RM), resource manager, can be understood as a DBMS system, or a message server management system. The application controls the resource through the resource manager, and the resource must implement the interface defined by XA. The resource manager provides support for storing shared resources.

TM: Transaction Manager ™, transaction manager, responsible for coordinating and managing transactions, providing AP application programming interface and managing resource manager. The transaction manager assigns an identifier to the transaction, monitors their progress, and is responsible for handling the completion and failure of the transaction. The transaction branch identifier (called XID) is designated by TM to identify global transactions and specific branches within an RM. It is the correlation mark between the log in TM and the log in RM. Two-phase commit or rollback requires XID in order to perform a resynchronization operation (also known as resync (resync)) when the system is started, or to allow an administrator to perform heuristic operations when required (also known as manual intervention).
Insert picture description here
Note:
AP and RM are necessary, and TM is an additional introduction by us. The reason for introducing TM is that in a distributed system, the theory of two machines cannot reach a consistent state, and a single point of coordination needs to be introduced. The transaction manager controls the global transaction, manages the life cycle of the transaction, and coordinates resources.

(2) Submit 2PC in two stages

Two-phase commit protocol, namely 2PC, Two Phase Commitment Protocol. The reason why it is called two-phase commit is relative to the single-database transaction commit method. After we complete the relevant data operations on a single database, we will submit or roll back directly. In a distributed system, a preparation phase is added before submission, so it is called a two-phase commit.

The first stage of preparation, the
second stage of commit or rollback

In practice, the situation will be much more complicated due to the influence of the stability and availability of the TM transaction manager itself, as well as possible problems in network communication. In addition, the transaction manager coordinates between multiple resources, and it has to do a lot of logging work by itself. The increase in the number of interactions on the network and the overhead of introducing the transaction manager are two aspects of increasing the overhead of distributed transactions using the two-phase commit protocol.
Therefore, after performing vertical or horizontal splits, you need to figure out whether you must introduce a two-stage distributed transaction, and it is recommended to use it only when necessary.

(3) CAP and BASE

The CAP theory was proposed by Eric Brewer at the PODC meeting in July 2000. The meaning of CAP is as follows.

Consistency: all nodes see the same data at the same time, that is, all nodes see the same data at the same time. This is the consistency of the data (indicated by C), that is, when the data is successfully written, all nodes will see the new data at the same time.

Availability: a guarantee that every request receives a response about whether it was successful or failed. This is the availability of data (indicated by A), the point here is that the system must be responsive.

Partition-Tolerance: the system continues to operate despite arbitrary message lossor failure of part of the system, even if there are some problems in the system or the loss of messages, the system can still continue to operate. This is called partition tolerance (indicated by Р), which means that the system can continue to work when a part of the system has problems.

However, the above three items cannot be satisfied at the same time in a distributed system. We can choose two of them to improve, while the other one will suffer losses. Then, when designing and weighing the system, you are actually choosing CA, AP or CP.

Choose CA, give up partition tolerance, and strengthen consistency and availability. This is actually the choice of a traditional stand-alone database.
Choose AP, give up consistency, and pursue partition tolerance and availability. This is the design choice of many distributed systems, such as many NoSQL systems.
Choose CP, give up availability, and pursue consistency and partition tolerance. The availability under this option will be relatively low, and network problems will directly make the entire system unavailable.

In distributed systems, we generally choose to enhance availability and partition tolerance at the expense of consistency (AP). Of course, what I'm talking about here is not that I don't care about consistency, but that I first satisfy and P, and then see how to solve the C problem.

Let's take a look at the BASE model. The meaning of BASE is as follows.
Basically Available: Basically available, partition failure is allowed.
Soft state: Soft state, accept a period of time out of synchronization.
Eventually consistent: Eventually consistent, to ensure that the state of the final data is consistent.

When we select A and P in the CAP in the distributed system, for C, the method and strategy we adopt is to ensure the final consistency, that is, we do not guarantee that all nodes are immediately consistent after the data changes, but they are ultimately consistent. . In large-scale websites, in order to better maintain scalability and availability, strong consistency is generally not selected, but eventually consistent strategy is adopted to achieve it.

(3) Paxos Agreement

(1) Basic knowledge

1: There is a prerequisite for using the Paxos protocol, that is, there is no Byzantine general problem. The Byzantine Generals problem is a problem where there is no way to guarantee a credible communication environment. The premise of Paxos is that there is a credible communication environment, which means that the information is accurate and has not been tampered with.

2: The process of proposing the Paxos algorithm is to virtualize a Greek city-state called Paxos, and introduce the Paxos algorithm by resolution through the parliament.

3: The Paxos algorithm first divides the roles of members into Proposers, Acceptors and Learners, and members can hold multiple roles

Proposers: The person who proposes the proposal is the role of the proposal.
Acceptors: The role of judgment after receiving the proposal. Acceptors choose whether to accept the proposal after receiving the proposal. If the proposal is accepted by a majority of Acceptors, the proposal is approved (Chosen).
Learners: You can only "learn" the approved bill, which is equivalent to the role of observing the passed bill.

4: Related terms
Proposal: A proposal, proposed by Proposers, approved or rejected by Acceptors.
Value: Resolution, the content of the proposal, each proposal is composed of a {number, resolution} pair.

5: Resolutions (Value) can only be approved after being proposed by Proposers (unapproved resolutions are called "Proposal").
In the implementation of the Paxos algorithm, only one Value can be approved (Chosen) at a time. Learners can only get approved (Chosen) Value.

(2) Basic-Paxos basic process

1: Prepare
Proposer puts forward a proposal numbered N, which is greater than the proposal number previously proposed by this Proposer. Quornm requesting Acceptors to receive.

2: Promise
if N is greater than any proposal number received by this Accpetor, then accept, otherwise reject

3:
If Accept reaches the majority, the proposer will issue an accept request, which contains the proposal number and content

4: If this Acceptor does not have any proposal greater than N during this period, then accept the content of this proposal, otherwise ignore

Detailed Paxos algorithm process:
https://zh.wikipedia.org/wiki/Paxos%E7%AE%97%E6%B3%95

Explanation:
1: The core principle of Paxos is that the minority obeys the majority.
2: In Paxos, if multiple people submit proposals at the same time, collisions may fail, and then both parties need to increase the number of the proposal before submitting the process. However, there is still a number conflict in the resubmission, so both parties need to increase the number to submit, which will cause a livelock. The solution is to set up a leader in the entire cluster, and all proposals are raised by it, so that such conflicts can be avoided. This is actually turning the work of the proposal into a single point, and the new problem that arises is how to deal with this leader if there is a problem, then you need to choose a leader.

(4) Sequence problem and processing of multiple machines

When changing to a horizontal sub-database, the original Sequence and self-incrementing Id methods in a single database need to be changed. In the familiar Oracle, support for Sequence is provided; in MySQL, support for Auto Increment field is provided, and we can easily implement a self-incrementing sequence with no repeating Id. After sub-database and sub-table, this became a problem. We can think about and solve this problem from the following two directions: (uniqueness and continuity)

If we only consider the uniqueness of ld, then we can refer to the way of generating UUID, or use each seed (identification of different dimensions, such as IP, MAC, machine name, time, local counter, etc.) according to our own business situation. Generate a unique ld. Although the uniqueness of the ld generated in this way is guaranteed, the continuity in the entire distributed system is not good.

The continuity mentioned here refers to the continuity of the Id generated in the entire distributed environment. In a stand-alone environment, it is actually a single point to complete this task. In a distributed system, we can use an independent system to complete this task.

(5) Analysis of data query problems of multiple machines

(1) Join problem

After the database is split, some of the previous Join operations need to be modified. If the data that needs Join is already distributed in multiple databases, then cross-database Join operations need to be completed. This will be more troublesome. The solutions are as follows:
1 : Convert the original joint query of a Join into a multi-step operation at the application level, first query XX according to XX, and then query XXX according to the previous XX. . .
2: Appropriately add redundant information, that is, put some field information that needs to be queried by multiple tables in one table, redundant fields, reduce excessive multi-table joint queries
3: Use external systems (such as search engines) to solve Some cross-library issues

(2) Foreign key constraints

The problem of foreign key constraints is more difficult to solve, and you can't completely rely on the database itself to complete the previous functions. If you want to make foreign key constraints on the single database after the sub-database, the data of each single database after the sub-database is required to be cohesive, otherwise it can only rely on the judgment and fault tolerance of the application layer.

(3) Combined query

This scenario is also different from the previous cross-database Join. Cross-database Join is a Join between different logical tables. After the database is divided, these Joins may need to span multiple databases, and the combined query is a query operation for a logical table. However, because it is physically divided into multiple databases and multiple tables, a combined query of the data is generated.

Considerations:
1: Sort: After multiple data sources are queried, sort in the application layer
2: Function processing: Use functions such as Max, Min, Sum, Count to perform corresponding function processing on the values of multiple data sources.
3: Calculate the average value. When querying from multiple data sources, you need to change the SQL to query Sum and Count, and then calculate the average value after summing Sum and Count from multiple data sources. This is something you need to pay attention to. local.
4: Non-sorted paging, this depends on the specific implementation strategy, whether it is paging processing on multiple data sources with the same step, or paging processing in the same proportion. The same step size means that in each page of the page, the
number of records from different data sources is the same; the same proportion means that in the decibel mother, the number of data from the same data source accounts for this data source. The proportion of the total number of conditions is the same.

(6) Design and implementation of data access layer

The data access layer is an abstraction layer that facilitates applications for data read/write access. On this layer, we solve the problem of accessing databases commonly used by various applications. In distributed systems, we also refer to the data access layer as the distributed data access layer, sometimes referred to as the data layer for short.

(1) The way to provide external data access layer

The first method is to provide users with a proprietary API, but this method is not recommended. Its versatility is very poor, it can even be said that there is no versatility. Generally speaking, this method is used to facilitate the realization of functions, or this method has relatively large changes and extensions to some common interface methods.

The second way is the universal way. In Java applications, the database is generally connected through JDBC. The data layer itself can be used as a JDBC implementation, that is, exposing the JDBC interface to the application. At this time, the application cost is very low, and the JDBC driver of the remote database is used. The method is the same, and the migration cost is also very low.

There is another method based on ORM or ORM-like interface. It can be said that this method is between the above two methods. For efficient and convenient application development, ORM
or ORM- like frameworks are generally used when using databases , such as iBatis, hibernate, Spring JDBC, etc. We can wrap a layer on the ORM framework used by our applications to implement the data layer The function of the original framework is still exposed to the outside world. This approach is relatively low cost for some functions, and has certain advantages in compatibility. For example, if the original system uses iBatis, the packaging on iBatis is more transparent for applications.

Insert picture description here

In addition, the use of ORM/ORM-like frameworks may cause difficulties due to the limitations of the framework itself. For example, it is more difficult to dynamically change SQL while using iBatis, which is not so difficult in the implementation directly based on the JDBC driver.

(2) Look at the data layer design according to the order of the data layer process

Insert picture description here
[Processing at the SQL parsing stage]
1: The degree of support for SQL and whether it needs to support all SQL depends on the specific scenario.
2: How many dialects of SQL are supported, and how much should be supported for the parts of different vendors that exceed standard SQL , This needs to be decided according to the specific scenario
3: In the process of SQL parsing, the parsing cache can increase the parsing speed.
4: Through SQL parsing, key information in SQL, such as table names, fields, where conditions, etc., can be obtained. In the data layer, a very important thing is to obtain the operated table according to the executed SQL, and determine the target data source connection according to the parameters and rules

[Rule processing stage]
1: Use a fixed hash algorithm as the rule.
The common way is to take the modulus based on a certain field (user id) of the table, and then disperse the data to different databases and tables. This method is acceptable for systems with fixed business and infrequent expansion, if it is not friendly for systems with complex business and need to be expanded later

2: Use consistent hashing algorithm.
The biggest change brought about by consistent hashing is that the hash value corresponding to the node becomes a range instead of being discrete. In consistent hashing, we will define the entire range of hash values very large, and then assign this range to existing nodes. If a node joins, then the new node will take charge of a part of the hash value from an original node; if a node exits, then the hash value originally managed by this node will be managed by its next node. Assuming that the hash value ranges from 0 to 100 and there are four nodes in total, the ranges they manage are [0,25), [25,50), [50,75), and [75,100]. If the second node exits, the remaining node management range becomes [0,25), [25,75), and [75,100]. It can be seen that the data managed by the first and fourth nodes has no effect. And the data originally managed by the third node has no impact, and you only need to take over the data that the second node is responsible for. If it is to add a node, for example, add one between the second and the third node, the range managed by these five nodes becomes [0,25), [25,50), [50,63), [63,75), [75,100], you can see that the first, second, and fourth nodes are not affected, and part of the data of the third node is not affected, and another part of the data needs to be added. Node to manage.

Insert picture description here

Problem: When a
new node is added, only one node is affected except for the new node. The load of this new node and the affected node is significantly lower than other nodes; when one node is reduced, except for the subtracted node In addition, only one node is affected, and it has to undertake the work of its original and subtracted nodes, and the pressure is obviously higher than that of other nodes. This seems to have to double the number of nodes or subtract half of the nodes to maintain the load balance of each node. If this is the case, the advantages of consistent hashing are not obvious.

3: The improvement of virtual nodes to consistent hashing
In order to deal with the above problems, we introduce the concept of virtual nodes. That is, 4 physical nodes can become many virtual nodes, and each virtual node supports a segment on a continuous hash ring. At this time, if a physical node is added, a lot of virtual nodes will be added accordingly. These new virtual nodes are relatively evenly inserted into the entire hash ring, so that the pressure of the existing physical nodes can be well shared. ﹔If one physical node is reduced, many corresponding virtual nodes will fail. In this way, there will be many remaining virtual nodes to undertake the work of the previous virtual nodes, but for the physical nodes, the increased load is relatively balanced. Therefore, one physical node corresponds to a large number of virtual nodes, and the virtual nodes of the same physical node are distributed as evenly as possible to solve the problem of unbalanced load when adding or reducing nodes.

[Rewrite SQL]

1: Table names of different databases, table names after a single database is sub-table, standard
2: Index names after sub-databases and tables are modified

(7) Challenges and responses to separation of reading and writing

(1 Introduction

Insert picture description here

The above figure shows a common application scenario where read-write separation is used. Through the separation of reading and writing, the reading pressure of the master library (Master) can be shared. There is a data replication problem, that is, copy the data of the main database to the slave database (Slave).

(2) The data structure is the same, with multiple slave libraries corresponding to one master library

1: Compared with the master-slave deployment structure, a simpler way can use the synchronization mechanism provided by the database system. For example, MySQL's Replication can solve the replication problem, and the delay is relatively small.

Insert picture description here
2: If the main library like the one shown in the figure above is a relatively complicated deployment method of combined structure, how to deal with it?
The application accesses the database through the data layer, and sends a message notification on the update of the database through the message system. The data synchronization server will copy the data after receiving the message notification. The sub-database rule configuration is responsible for letting the data layer know the sub-database rules when reading data and the data synchronization server updates the sub-database. The interaction between the data synchronization server and the DB main database is mainly based on the modified or newly added data primary key to obtain the content, and the row replication method is adopted.
It can be said that this is an inelegant way but can solve the problem. The more elegant way is to replicate data based on the database log.

(3) Data replication in different ways of main/standby database sub-database

Database replication is a more critical task in the separation of reads and writes. In general, symmetric copying is performed, that is, mirroring, but there are also some scenes that are copied asymmetrically. Asymmetric replication here means that the source data and the target data are not mirrored, and it also means that the source database and the target database are different implementations. At this time, we can't simply copy data, we need to control the distribution of data, and do some processing for actual business scenarios.

(4) Introduce a data change platform

Copying to other databases is a scenario for data changes, and there are other scenarios that also care about data changes, such as search engine index construction, cache invalidation, and so on. We can consider building a common platform to manage and control data changes.

(5) How to achieve smooth data migration

The biggest challenge for a smooth database migration is that there will be data changes during the migration process. The solution that can be considered is to record the incremental log when starting the data migration, and then process the incremental changes after the migration is over. At the end, you can suspend the writing of the data to be migrated to ensure that the incremental logs are processed, then switch the rules, release all the writes, and complete the migration work.

Reference process steps:
1: First of all, we are sure to start expansion, and start to record the incremental log of database data changes

Insert picture description here

At this time, both the incremental log and the new database table are still empty. We use id to identify the record, and v to identify the version number (this is not a business field of the database table, but a flag we added in order to clarify the smooth migration process).

2: Next, the data starts to be copied to the new database table, and there are also updates coming in

Insert picture description here
As you can see, the data with id=1 and id=3 are already in the new database table, but the record version with id=1 is old, and the record version with id=3 is already new. When we copy all the data in the source database table to the new database table, there will be a situation that, because there will be changes during the copying process, the data in the new database table is not all the latest data.

3: When the full migration is over, we also migrate the data in the incremental log

Insert picture description here

It can be found that this approach does not guarantee that the data in the new database table and the data in the source database table must be consistent, because when we process incremental logs, new incremental logs will come in. This is a gradual convergence process. .
4: Then we compare the data. At this time, there may be differences between the new database data and the source database data. Record them.
5: Then we stop the write operation of the data to be migrated in the source database, and then process the incremental log to make the data in the new database table new.
6: Finally update the routing rules, all new data read or write to the new database table, thus completing the entire migration process.