[MySQL]-[Database design specifications]

Why database design is needed

Insert image description here
Insert image description here

paradigm

Paradigm introduction

In relational databases, the basic principles and rules for data table design are called paradigms.. It can be understood as the level of a certain design standard that the design structure of a data table needs to meet. In order to design a reasonably structured relational database, it must meet a certain paradigm.

What do paradigms include?

Currently, there are six common paradigms in relational databases. According to the paradigm level, from low to high, they are:First normal form (1NF), second normal form (2NF), third normal form (3NF), Buss-Code normal form (BCNF), fourth normal form (4NF) and fifth normal form (5NF, also known as perfect normal form).
Insert image description here
Insert image description here

The concept of keys and related properties

Please add image description
Super key: A set of attributes that can uniquely identify a row of data. In fact, as long as it is the primary key + any field, a super key can be formed. Candidate key
: A field that can truly uniquely identify a row of data. For example,
there can only be one primary key for a person's ID card number. There can be multiple candidate keys.
The attributes of the candidate keys are called primary attributes.

Example: Here are two tables:

  1. Player table (player): player number | name | ID number | age | team number
  2. Team table (team): team number | head coach | team location

Super key: For the player table, the super key is any combination of player number or ID number, such as (player number) (player number, name) (ID number, age), etc.
Candidate key: It is the smallest super key. For the player table, the candidate key is (player number) or (ID card number).
Primary key: We choose it ourselves, that is, choose one from the candidate keys, such as (player number).
Foreign key: Team number in the players table.
Primary attributes and non-primary attributes: In the player table, the primary attribute is (player number) (ID card number), and other attributes (name) (age) (team number) are non-primary attributes.

First normal form (1st NF)

1. First normal form: It mainly ensures that the value of each field in the data table must be atomic, which means that the value of each field in the database table is the smallest data unit that cannot be split again.
2. When we design a certain field, for field X, we cannot split field X into field X-1 and field X-2. In fact, any DBMS will meet the requirements of the first normal form and will not split fields.
Please add image description
Please add image description
Please add image description
Please add image description

Second normal form (2nd NF)

1. The second normal form requires that on the basis of satisfying the first normal form, it must also satisfy that == every data record in the data table can be uniquely identified. Moreover, all non-primary key fields must completely rely on the primary key, and cannot only rely on part of the primary key. ==If you know the values ​​of all attributes of the primary key, you can retrieve any value of any attribute of any tuple (row). (The primary key in the requirement can actually be expanded and replaced by a candidate key)

Non-primary key fields are completely dependent on the primary key: If field 1 and field 2 in the table form a unique primary key, the other fields (non-primary key fields) can determine the unique value based on the values ​​​​of these two fields. If a non-primary key field is based on field 1 (or field 2) can determine its own value, then this is called a non-primary key field that depends on part of the primary key. In this case, it is recommended to create an independent table for field 1 (or field 2) and the non-primary key field, and then This table is related to the original table

Please add image description
Please add image description
Please add image description
When submitting an order on Taobao shopping, there may be multiple products in an order. If you put all the information in the same table and set the order number and product number as the unique primary keys, it will violate the second normal form, because the order creation time, Company name, customer name and other information can be determined based on the order number, and the product quantity can be determined based on the product number, so we can set up two tables
Please add image description

Third normal form (3rd NF)

1. The third normal form is based on the second normal form and ensures that every non-primary key field in the data table is directly related to the primary key field, that is to say,It is required that all non-primary key fields in the data table cannot depend on other non-primary key fields.(That is, there cannot be a situation where non-primary attribute A depends on non-primary attribute B, and non-primary attribute B depends on primary key C, that is, there is a determining relationship of "A-B-C") In layman's terms, this rule means that all Non-primary key attributes cannot have direct dependencies and must be independent of each other.
2. The primary key here can be expanded to a candidate key.
Insert image description here
After the department number is listed, department name, department profile and other department-related information cannot be added to the employee information table because Department name, department profile and other department-related information all rely on the non-primary key field of department number, which violates the third normal form.
Please add image description
Please add image description
Please add image description
Please add image description
Each non-primary key attribute depends on the primary key, depends on the entire primary key (cannot be partially dependent), and has nothing but the primary key. object (independent of other non-primary keys)
Please add image description

denormalization

Overview

Please add image description

Application examples

Please add image description
Please add image description
Please add image description
Please add image description
Anti-paradigm optimization experiment comparison:

  1. Create database and tables:
    Please add image description
  2. adding data
    Please add image description
    Please add image description
    Insert image description here
    Insert image description here
  3. Satisfies the third normal form, query
    Please add image description
  4. search result
    Please add image description
  5. Design against paradigm
    Insert image description here
  6. denormalized query
    Insert image description here
  7. Denormalize query results
    Please add image description

New issues in denormalization

  1. The storage space has become larger.
  2. If a field in one table is modified, the redundant fields in another table also need to be modified synchronously, otherwise the data will be inconsistent.
  3. If stored procedures are used to support additional operations such as data updates and deletions, frequent updates will consume a lot of system resources.
  4. When the amount of data is small, anti-paradigm cannot reflect the performance advantages and may make the database design more complex.

Applicable scenarios for anti-paradigm

When redundant information is valuable or can greatly improve query efficiency, we will adopt anti-paradigm optimization.
1. Suggestions for adding redundant fields
Please add image description
2. The need for historical snapshots and historical data
In real life, we often need some redundant information, such as consignee information in orders, including name, phone number, and address. Every order receipt information that occurs is a historical snapshot and needs to be saved, but users can modify their information at any time. At this time, it is very necessary to save this redundant information.
Anti-paradigm optimization is also commonly used in the design of data warehouses, because data warehouses usually store historical data, do not have strong requirements for real-time additions, deletions, and modifications, but have strong requirements for the analysis of historical data. At this time, it is appropriate to allow data redundancy to make data analysis more convenient.
Please add image description
Follow the three paradigms first, and then consider de-normalization

BCNF (Bath Normal Form)

Please add image description

Case

Case number one

1. Analyze the paradigm of the following table:
Insert image description here
In this table, a warehouse has only one administrator, and an administrator only manages one warehouse. Let’s first sort out the dependencies between these properties.

  1. The warehouse name determines the administrator, and the administrator also determines the warehouse name. At the same time, the attribute set of (warehouse name, item name) can determine the quantity attribute. In this way, we can find the candidate keys of the data table.
  2. Candidate keys: are (administrator, item name) and (warehouse name, item name), and then we select one from the candidate keys as the primary key, such as (warehouse name, item name).
  3. Primary attributes: attributes contained in any candidate key, that is, warehouse name, administrator and item name.
  4. Non-primary attribute: the attribute of quantity.

2. Whether it conforms to the three normal forms (How to judge the paradigm of a table? We need to judge according to the level of the paradigm, from low to high.): First, every attribute of the data table is atomic and meets the requirements of 1NF ; Secondly, the non-primary attribute "quantity" in the data table is completely dependent on the candidate key, (warehouse name, item name) determines the quantity, (administrator, item name) determines the quantity. Therefore, the data table meets the requirements of 2NF; finally, non-primary attributes in the data table do not transitively depend on candidate keys. Therefore, it meets the requirements of 3NF.
3. Existing problems: Since the data table already meets the requirements of 3NF, is there no problem? Let's look at the following situation:

  1. A warehouse is added, but no items are stored yet. According to the requirements of data table entity integrity, the primary key cannot have a null value, so
    an insertion exception will occur;
  2. If the warehouse changes its administrator, we may modify multiple records in the data table;
  3. If all the goods in the warehouse are sold out, the warehouse name and corresponding administrator name will also be deleted at this time.

You can see that even if the data table meets the requirements of 3NF, there may still be exceptions when inserting, updating, and deleting data.
4. Problem Solving
First, we need to confirm the cause of the exception: the main attribute warehouse name is partially dependent on the candidate key (administrator, item name), which may lead to the above exception. Therefore, BCNF is introduced, which eliminates the partial dependence or transitive dependence of the primary attribute on the candidate key based on 3NF.

If in relation R, U is the primary key and attribute A is an attribute of the primary key. If A->Y exists and Y is the primary attribute, then the relationship does not belong to BCNF.

According to the requirements of BCNF, we need to split the warehouse management relationship warehouse_keeper table into the following:
Warehouse table: (warehouse name, administrator)
Inventory table: (warehouse name, item name, quantity)
In this way, there is no primary attribute for candidate key For partial dependencies or transitive dependencies, the design of the above data table complies with BCNF.

Case 2

There is a student tutor table, which contains fields: student ID, major, tutor, major GPA, where student ID and major are joint primary keys.
Insert image description here
The design of this table satisfies the three paradigms, but there is another dependency relationship here. "Professional" depends on "tutor", which means that each tutor only serves as a professional tutor. As long as we know which tutor it is, we will naturally know Which major is it? Therefore, part of the primary key Major of this table depends on the non-primary key attribute Advisor. Then we can make the following adjustments and split it into two tables:

  1. Student mentor list:
    Insert image description here
  2. Tutor list:
    Insert image description here

fourth paradigm

Insert image description here
Multi-valued dependency: One attribute can determine the value of N attributes.
Single-valued dependency: Like a function, one variable determines the value of another variable.
On the basis of satisfying the third normal form, the fourth normal form eliminates non-trivial multi-valued dependencies and functions. Dependencies, only ordinary multi-valued dependencies are retained

Case

Case number one

Employee table (employee number, name of employee's child, employee elective courses). In this table, the same employee may have multiple employee child names. Similarly, the same employee may also have multiple employee elective courses, that is, there are multi-valued facts (there are multiple one-to-many situations, that is, non-trivial multi-valued dependencies), which does not conform to the fourth paradigm.
If you want to comply with the fourth normal form, you only need to divide the above table into two tables so that they have only one multi-valued fact, for example: Employee Table 1 (employee number, employee's child name), Employee Table 2 (employee number, employee elective courses ), both tables have only one multivalued fact, so they conform to the fourth normal form.

Case 2

Establish models of courses, teachers, and teaching materials. We stipulate that each course has a corresponding set of teachers, and each course also has a corresponding set of teaching materials. The teaching materials used in a course have no relationship with the teachers. The relationship table we established is as follows: course ID, teacher ID, and textbook ID; these three columns serve as the joint primary key. (For convenience of expression, we use Name instead of ID, which is easier to understand.)
Insert image description here
This table has no other fields except the primary key, so it definitely meets the BC paradigm, but there are exceptions caused by multi-value dependencies. If we want to use a new British version of advanced mathematics textbook next semester, but we have not determined which teacher will teach it, then we cannot maintain the relationship between Course advanced mathematics and Book English version of advanced mathematics textbooks in this table.
The solution is for us to disassemble this multi-value dependency table into two tables and establish relationships respectively. Here is our split table:
Insert image description here
and
Insert image description here

Fifth normal form, domain key normal form

1. In addition to the fourth normal form, we also have the more advanced fifth normal form (also called perfect normal form) and domain key normal form (DKNF).
2. On the basis of satisfying the fourth normal form (4NF), eliminate connection dependencies that are not contained in the candidate keys.If every connection dependency in the relational schema R is implied by a candidate key of R, the relational schema is said to conform to the fifth normal form.
3. Functional dependency is a special case of multi-valued dependency, and multi-valued dependency is actually a special case of connection dependency. However, unlike functional dependencies and multi-valued dependencies, which can be directly derived from semantics, join dependencies are reflected in relational join operations. Relational models with connection dependencies may still encounter problems such as data redundancy and insertion, modification, and deletion exceptions.
Fourth, the fifth paradigm deals with the problem of lossless connections. This paradigm is basically meaningless because lossless connections rarely occur and are difficult to detect. The domain key paradigm attempts to define an ultimate paradigm that considers all types of dependencies and constraints, but has minimal practical value and only exists in theoretical research.

Paradigm practical cases

Guess you like

Origin blog.csdn.net/CaraYQ/article/details/130812242