System architecture design professional skills · Database design

Table of Contents of Series Articles

System architecture design professional skills · Software Engineering (1) [System Architect]
Advanced system architecture design skills · Software architecture concepts, architectural styles, ABSD, architecture reuse, DSSA (1) [System Architect]
Advanced system architecture design Skills · System quality attributes and architecture assessment (2) [System Architect]
Advanced skills in system architecture design · Software reliability analysis and design (3) [System Architect]

1. Database concept

1.1 Data model

Data models are divided into: hierarchical model, network model, object-oriented model, and relational model.

The three elements of a data model are: data structure, data operations, and data constraints.

Data constraints include:
(1) Entity integrity:
(2) Referential integrity:
(3) User-defined integrity:

1.2 Database view

A view does not actually exist in the database, but is a virtual table.
Insert image description here

2. Database model

Databases generally adopt a three-level model. System developers need to reduce the complexity of the user shielding system and simplify the interaction between users and the system through abstraction at three levels: view layer, logical layer and physical layer.
From the perspective of database management systems, databases are also divided into external schemas, conceptual schemas and internal schemas.
Insert image description here
The database system provides two-level images between three-level schemas: conceptual schema/internal schema image, and external schema/conceptual schema image. These two levels of imaging ensure that the data in the database has high logical independence and physical independence.

Database three-level schema

external mode conceptual model internal mode
Also called sub-mode or user mode, it is used to describe the logical structure of the part of data that users see or use. Users use data manipulation statements or applications to operate data in the database according to the external mode. It is a description of the logical structure and characteristics of all data in the database, and is a common data view for all users. It is a description of the physical structure and storage method of data, how data is represented within the database, and defines all internal record types, indexes, and file organization methods.

Database two-level imaging

logical independence physical independence
Corresponds to the mapping between external schema and conceptual schema. It means that the application program is independent of the logical structure in the database. When the logical structure of the data changes, the application program remains unchanged. Corresponds to the mapping between conceptual schemas and internal schemas. It means that the application and the data on the disk are independent of each other. When the physical storage of data changes, the application does not change

3. Relational Database

3.1 Relational model

The three elements of a data model are: data structure, data operations, and data constraints.

Relational model expression
form 1:
student (student number, name, age, class number)

Form 2:
Student (U, F)
U = {student number, name, age, class number}
F = {student number → name, student number → age, student number → class number}

Basic concept:
order or degree: the number of attributes in the relationship pattern.
Candidate key (candidate key): The value of an attribute or attribute group in a relationship, and uniquely identifies a tuple.
Primary key (primary key): If there are multiple candidate keys in a relationship, select one of them as the primary key.
Primary attributes and non-primary attributes: The attributes that make up the candidate code are the primary attributes, and the others are non-primary attributes.
Foreign key (foreign key): The code of other relationships is the foreign key.
Full code: All attribute groups of the relationship pattern are candidate codes for this relationship.

Integrity constraints:

  • Entity integrity constraint: stipulates that the main attributes of the basic relationship cannot take null values.
  • Referential integrity constraints: references between relationships, primary keys or null values ​​of other relationships.
  • User-defined integrity constraints: determined by the application environment.
  • trigger:

3.1 Relational operations

Union (∪) : The union of relations R and S is the set consisting of tuples belonging to or belonging to S.

Intersection (∩) : The intersection of relations R and S is the set of tuples that belong to R and belong to S at the same time .

Difference (—) : The difference between relations R and S is the set of tuples that belong to R but not to S.

Insert image description here

Cartesian product (X) : The Cartesian product of two relations R and S with n and m columns respectively is a set of tuples with (n + m) columns. The first n columns are a tuple of the relation R, and the last m columns are a tuple of the relation S, denoted as RXS. If R and S have the same attribute name, the relation name can be added as a qualification before the attribute name to indicate the difference. If R has K1 tuples and S has K2 tuples, then the Cartesian product of R and S has K1 X K2 tuples.

Select (σ) : Get the rows in the relationship R that meet the conditions.

Projection (π) : Obtain the qualified columns in the relation R.

Insert image description here

Connection (Φ) :
Equivalent connection: Select relationships R and S, and select tuples with equal attribute values ​​in the Cartesian product of the two.
Natural join: A special equivalent join that requires the comparison of attribute columns to be the same attribute group, and removes duplicate attributes from the results.

Insert image description here

3.1 Basic theory of relational data design

The goal of relational database design is to generate a set of appropriate and well-performing relational schemas that reduce the redundancy of information storage in the system but allow easy access to information.

3.1.1 Functional dependencies

Let R (U, F) be a relational pattern on attribute U, X and Y are subsets of U, and r is any relation of R. If for any two tuples u, v in r, as long as there is u[ Y] = v[Y], then it is said that the X function depends on Y, or the Y function depends on X, denoted as X → Y, which is called functional dependence.
For example: student number → department number, department number → department name

Insert image description here

3.1.2 Key/Candidate Key

Insert image description here

  • Primary attributes and non-primary attributes: The attributes that make up the candidate code are the primary attributes, and the others are non-primary attributes.

Find candidate key instances

  • Represent the functional dependencies of the relational pattern in the form of a "directed graph".
  • Find the attribute with degree 0, and use this attribute set as the starting point to try to traverse the directed graph. If all nodes in the graph can be traversed normally, then this attribute set is a candidate key for the relational pattern.
  • If the attribute set with an in-degree of 0 cannot traverse all the nodes in the graph, you need to try to incorporate some intermediate nodes (nodes with both in-degree and out-degree) into the attribute set with an in-degree of 0 until the Sets can traverse all nodes, and sets are candidate keys.
    Insert image description here

3.1.3 Axiom of functional dependence (Armstrong’s axiom)

From known functional dependencies, other functional dependencies can be deduced, which requires a series of inference rules. These rules are often called " Armstrong's axioms ".

Assuming the relation R (U, F), U is the attribute set of the relation model R, and F is a set of functional dependencies of U, then there are the following three inference rules: (1) Reflexive law: if Y
X U , Then X → Y is entailed by F.
(2) Augmenting law : If Z ⊆ U and X → Y is entailed by F, then XZ → YZ is entailed by F.
(3) Transitive law : X → Y, Y → Z are implicated by F, then X → Z is implicated by F.

According to the above reasoning rules, the following three rules can be deduced :
(1)Merging rule : if X → Y, X → Z, then X → YZ is implied by F.
(2) Pseudo-transitive rule : If X → Y, WY → Z, then XW → Z is entailed by F.
(3) Decomposition rule :Y , Z ⊆ Y, then X → Z is implied by F.

The proof is as follows:
Insert image description here

3.1.4 Normalization theory

One of the methods of relational database design is to meet the appropriate paradigm model. Usually, the degree of standardization of the model can be evaluated by judging how many paradigms the decomposed model reaches. Normal forms include: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF.
Insert image description here
What impact does the increase in specification level bring?

(1) First normal form (1NF) : In the relational schema R, if and only if all fields contain only atomic values, that is, each attribute is an indivisible data item, then the relational schema R belongs to the first normal form .

For example: The following does not meet 1NF, and the number of senior professional titles can be further divided into professors and associate professors .
Insert image description here

(2) Second normal form (2NF) : If the relational schema R ∈ 1NF, and each non-primary attribute completely depends on the primary key (there is no partial dependence), then the relational schema R belongs to the second normal form .

For example: The following does not satisfy 2NF, and the course number can contain credits .
Insert image description here

(3) Third normal form (3NF) : If the relational pattern R ∈ 2NF, and there is no transitive functional dependence of non-primary attributes on the primary key. Then the relational pattern R belongs to the third normal form .

For example: The following does not satisfy 3NF, the department name and department position depend on the department number .
Insert image description here

(4) BC Normal Form (BCNF) : Suppose R is a relational pattern, F is its dependency set, R belongs to BCNF if and only if the determinant of each dependency in F must contain a certain candidate code of R.

For example:
Insert image description here

3.1.5 Pattern decomposition (whether to maintain functional dependencies & whether to be lossless)

4. Database design

The basic steps of database design can be divided into user needs analysis, conceptual structure design, logical structure design, physical structure design, database implementation stage (application design), operation and maintenance.
Database design process:
Insert image description here

4.1 Conceptual structural design

4.2.1 ER model

ER model, referred to as ER diagram , is a practical tool for describing the conceptual world and establishing conceptual models. The three elements of the ER diagram:
(1) Entity: represented by a rectangle, with the name of the entity marked in the box.
(2) Attributes: represented by elliptical graphics and connected with entities by lines.
(3) The relationship between entities: represented by a diamond box, with the name of the contact marked in the box, connecting the diamond boxes to the relevant entities with lines, and indicating the contact type on the lines.
Insert image description here

4.2.2 The relationship between two different entities in the ER diagram:

Insert image description here

4.2.3 Process of conceptual structure design:

Insert image description here

  • Integration method :
    multiple local ER diagrams are
    integrated at one time and integrated step by step, and two local ER diagrams are integrated at a time in an accumulative manner.

  • Conflicts caused by integration and their solutions :
    Attribute conflicts: including attribute domain conflicts and attribute value conflicts
    Naming conflicts: including same-name objections and synonymous
    structure conflicts: including unified objects having different abstractions in different applications, and the same entity in different applications The number of attributes and the order of attributes included in different partial ER diagrams are not exactly the same.

4.2 Logical structure design

Insert image description here

  • Conversion of ER diagram to relational model Conversion
    of entity to relational model Conversion
    of contact to relational model
  • Normalization of relational schema
  • Determine integrity constraints (guarantee data correctness)
  • Determination of the user view (to improve data security and independence)
    Determine the view of the processing process based on the data flow diagram
    Determine the views used by different users based on the user view
  • Application design

☆ An entity type must be converted into a relationship model
☆ Contact to relationship model :
Insert image description here

  • (1) There are two ways to convert a one-to-one relationship to
    an independent relationship model : merging the primary keys at both ends and the properties of the relationship itself. ( Primary key : primary key at either end)
    Merge (either end ): merge into the primary key at the other end and associate its own attributes. ( Primary key : remain unchanged)

  • (2) There are two independent relationship modes for converting one-to-many relationships : merging the primary keys at both ends and the properties of the relationship itself. ( Primary key : multi-terminal primary key) Merge (multi-terminal) : Merge into the other end's primary key and related attributes. ( Primary key : remain unchanged)

  • (3) There is only one way to convert a many-to-many relationship
    to an independent relationship model : merging the primary keys at both ends and the properties of the relationship itself. ( Primary key : the combination key of the primary keys at both ends)

4.3 Concurrency control

4.3.1 ACID characteristics of transactions

4.4 Database security

4.5 Database backup and recovery

4.6 Database performance optimization

1. Interaction between application and database

1. NoSQL database

No SQL (Not-only SQL): With the rise of Internet web 2.0 websites, traditional relational databases have become unable to cope with web 2.0 websites, especially ultra-large and highly concurrent SNS type web 2.0 purely dynamic websites, and have been exposed. There are many problems that are difficult to overcome, but non-relational databases have developed very rapidly due to their own characteristics.

relational database schema NoSQL model
Concurrency support Support concurrency, low efficiency High concurrency performance
Storage and query Relational table storage, SQL query Massive data storage and high query efficiency
Extension mode Expand upward scale out
Index mode B-tree, hash, etc. Key value index
Application areas Oriented to general fields specific application areas
Classification Typical application scenarios data model advantage shortcoming ExamplesExamples
key-value Content caching is mainly used to handle high access loads of large amounts of data, and is also used in some logging systems and so on. Key points to the key-value pair of Value, usually implemented using a hash table Fast search speed The data is unstructured and is usually only treated as string or binary data Redis,Tokyo Cabinet/Tyrant,Voldemort,Oracle BDB
Column store database distributed file system Store data in column clusters to store data in the same column together Fast search speed, strong scalability, and easier distributed expansion Functions are relatively limited HBase,Cassandra,Riak
document database Web application (similar to Key-Value, Value is structured, but the difference is that the database can understand the content of Value) The key-value pair corresponding to Key-Value, Value is structured data The data structure requirements are not strict and the table structure is variable. There is no need to pre-define the table structure like a relational database. The query performance is not high and there is a lack of unified query syntax. Couch DB, Mongo Db
Graph database (Graph) Social networks, recommendation systems, etc. Focus on building relationship graphs graph structure Utilize graph structure related algorithms. For example, shortest path addressing, N-degree relationship search, etc. Many times it is necessary to calculate the entire graph to obtain the required information, and this structure is not suitable for distributed cluster solutions. Neo4J, Info Grid,Infinite Graph

1. Distributed database

Insert image description here
Insert image description here

1. Database optimization technology

1. Distributed caching technology Redis

Guess you like

Origin blog.csdn.net/weixin_30197685/article/details/132178137