3.1.2 Database system-database architecture: distributed database, distributed database characteristics, distributed database structure, data fragmentation, distributed database transaction

3.1.2 Database Architecture: Distributed Database, Distributed Database Characteristics, Distributed Database Structure, Data Fragmentation, Distributed Database Transaction

Distributed database

Distributed databases are mainly different from centralized databases. Centralized databases such as Oracle and SQL Server are centralized databases, which centralize data on one database server, while distributed databases consider data Physically distributed on different physical nodes, this is a distributed database.

For distributed databases, because they are placed on different physical nodes, when considering accessing data, you can consider accessing adjacent nodes. Compared with centralized databases, the efficiency can be optimized, and data can be distributed according to requirements. , and at the same time, some requirements of load balancing can be achieved.

When doing data-related distribution, you can also store some copies of the same data, which can play a backup role, so as to solve some security problems in database reliability.

Therefore, distributed databases have more functions than centralized databases.

Distributed database characteristics

  • Data independence
    In addition to the logical independence and physical independence of data, there is also data distribution independence (distribution transparency)

  • Centralized and autonomous shared control structure
    Each local DBMS (Database Management System, database management system) can independently manage local databases and has an autonomous function. At the same time, the system has a centralized control mechanism to coordinate the work of various local DBMSs and execute global applications.

  • Appropriately increase data redundancy
    Storing multiple copies of the same data in different locations can improve system reliability and availability, as well as system performance. (To improve the availability of the system, when a node of the system fails, because there are other copies of the data on the non-failure site, the data is still available for all other sites, thereby ensuring the integrity of the data)

  • Global consistency, serializability and recoverability

Distributed database structure

insert image description here

In distributed databases, the global external mode is the user view of the global application, which is directly displayed to the user program.

The concept model is divided into global concept model and local concept model, so it can be divided into global concept and local concept from the level of concept model.

The conceptual schema corresponds to the database table. In a distributed database, it can have a global database table, or it can be cut and put into a local database table, so it is divided into a global conceptual schema and a local conceptual schema.

How to divide the fragmentation mode
? It involves how to fragment a database, and there is a fragmentation mode. The sharding mode can have vertical sharding, horizontal sharding, and mixed sharding modes. After each piece of data is divided, which physical nodes are these data placed on? This placement process corresponds to the distribution mode.

How to put the distribution mode
into a local part? After being placed in the local area, each local area has its own conceptual model and internal model. The conceptual model and internal model seen in the local area are basically consistent with the original centralized database. The specific data is stored in the part of the local database. This is the reflected distribution pattern.

In the whole level, there will be a local DBMS for local database management, and a global DBMS for global applications.

Local database management system (LDBMS, Local database management system)
Global database management system (GDBMS, Global database management system)
Communication management (CM, Communication management)
Distributed data management system (DDBMS, Distributed data management system)

Distributed database management system - composition

  • LDBMS
  • GDBMS
  • Global Data Dictionary
  • Communications Management (CM)

Distributed database management system - structure

  • DDBMS in global control
  • Global control over decentralized DDBMS
  • Global control over partially decentralized DDBMS

data sharding

distribution transparency

  • Fragmentation Transparency
  • location transparency
  • Local Data Model Transparency

Fragmentation transparency : users don't have to care about how the data is fragmented, and their operations on the data are performed on the global relationship , that is, how to fragment is transparent to the user.

Replication transparency : users do not care about the replication of the database at each node in the network, and the update of the replicated data is automatically completed by the system.

Location transparency : users do not need to know where the data they are operating on is located, that is, which site or sites the data is allocated to and stored is transparent to users.

Partial image transparency (logic transparency) : the lowest level of transparency, which provides the image of data to the local database, that is, the user does not care which data model the local DBMS supports, which data manipulation language is used, the data model and the manipulation language The conversion is done by the system. Therefore, partial image transparency is very important for heterogeneous and homogeneous distributed database systems.

Fragmentation

How to fragment the database table? Three ways, horizontal sharding, vertical sharding, and hybrid sharding.

horizontal sharding

Horizontal slicing, from the horizontal direction, the horizontal direction of the database table is a tuple, placing different atomic rows on different subsets, and then placing different subsets on different physical nodes, this is horizontal sharding.

For example, subway trip records can be divided into 12 months by month, or divided by provinces and cities, and allocated to physical nodes close to the corresponding cities. The access path is closer, the performance will be better, and the data volume will be larger. smaller.

vertical sharding

In the vertical direction, it belongs to columns, and different attribute columns are divided into a subset. For the same library table, different people use different library tables. It is only necessary to classify the data required by a certain part into the same subset and put it on the same physical node, and other unnecessary data can be placed on distant physical nodes, which can also improve access efficiency.

hybrid sharding

Hybrid sharding is to shard both vertically and horizontally. On the one hand, attribute columns are restricted, and on the other hand, tuple rows are restricted, thereby forming smaller subsets for sharding.

Distributed database transactions

Distributed databases involve multiple local data, and the second application is usually a global application, so it needs to read or write multiple local data. Then, in this series of operations, is there a situation where one node completes and the other node does not complete it?

Therefore, when doing distributed database transaction management, it is necessary to communicate in various ways.

Two-phase commit protocol 2PC

Two phases of 2PC transaction submission
Voting phase, the purpose is to form a common decision
Execution phase, the purpose is to realize this coordinator's decision

Two global commit rules
The coordinator must make a global commit decision whenever one participant
withdraws the transaction

Guess you like

Origin blog.csdn.net/qq_41929714/article/details/129635168