The Simplest Database "Paradigm" Tutorial

Copyright statement: reprinted from the  link https://blog.csdn.net/yangbodong22011/article/details/51619590    

Since it is called the simplest database "paradigm" tutorial, I think it must meet this requirement: after reading this blog, you will definitely understand the database "paradigm" and those such as "complete functional dependencies", "partial functional dependencies" , "transfer function dependency" and other annoying concepts, provided you have to follow my ideas and read it carefully, well, are you ready for your half hour? Let's start.

content

  1. What is a paradigm?
  2. An example throughout the text.
  3. First Normal Form (1NF)
  4. several important concepts.
  5. Second Normal Form (2NF)
  6. Third Normal Form (3NF)
  7. BC Normal Form (BCNF)
  8. Fourth Normal Form (4NF)

1. What is a paradigm?

The paradigm is actually 关系数据库规范程度的级别, to give an example in our life, the teacher asks for cleaning, the minimum standard is 扫地, the second standard is 扫地+擦桌子, the third standard is 扫地+擦桌子+擦玻璃, in fact, in the process of rising standards, the first higher standard is to meet the lower standard. The first-level standard is the same as the paradigm, but its standard describes the degree of database normalization. The normal forms in the database are 1NF, 2NF, 3NF, BCNF, 4NF, 5NF (not discussed in this article). The relationship between them 1NF < 2NF < 3NF < BCNF < 4NF(the larger the default, the higher the level).

2. An example throughout the text.

write picture description here
This example has the following relationship

(Student ID, Course Name) -> Score
Student ID -> Name
Student ID -> Department Name
Department Name -> Department Head
Note: A->B: It is understood that the A attribute can uniquely determine the B attribute.

This example will be progressively optimized when speaking of Second Normal Form and Third Normal Form.

3. First Normal Form (1NF)

Each column of the database table is an indivisible atomic data item . It is the minimum standard of a relational database. If it does not meet the first normal form, then the database is not a relational database. As shown below:
write picture description here
We cannot create this table directly in the database, because this column of it has both 系名and there 系主任is a conflict. If we make adjustments to make it conform to the first normal form, it should be like this:
write picture description here

But even so, there will still be exceptions such as insertion, deletion, data redundancy, etc. in the database. For example for the following table:
write picture description here

  • Data redundancy: It can be seen that the three columns of name, department name and department chair are very redundant.
  • Insertion exception: If the Department of Computer Science is currently open, but has not yet recruited students, the Department of Computer Science cannot be inserted into the department name of the table. (Student ID, Course Name) is the main attribute and cannot be empty.
  • Delete exception: If all the students in the economics department have graduated, when all students are deleted, the economics department will be deleted together

So the design of this database is flawed, and to optimize it, we move on to the second normal form. Before talking about the second normal form, let's talk about a few important concepts.

4. Several important concepts.

  • Functional dependency: In a table, if the value of attribute A (or attribute group) is given, the value of attribute B must be uniquely determined, then it is said that B depends on attribute A (or attribute group), denoted as A->B, For example, given a student number, the name can be determined, and a student number must be able to determine a name. It's as simple as that. Does it seem familiar? Yes, we actually used this concept when we talked about the example throughout the text in Part 2.

  • Complete functional dependency: On the basis of functional dependency, our A attribute is an attribute group, and only all attributes in this attribute group can determine the unique B attribute, and any subset of A cannot. For example (student number, course name) -> grade, and the student number or class name alone cannot determine the grade, which is called complete functional dependency.

  • Partial functional dependency: Compared with complete functional dependency, some attributes in the A attribute group can determine the B attribute, and other attributes are optional, such as (student number, course name) -> name, in fact, only the student number is enough , such dependencies are called partial functional dependencies.

  • Transfer function dependency: If A->B, B->C, and B cannot ->A (preventing direct A->C), then we can conclude that A->C is called C transfer function depends on A. For example, student number -> department name, department name -> department chair, and (the department name cannot determine the student number), so the transfer function of the department chair depends on the student number.

  • Code: An attribute or attribute group, so that all other attributes in the entire relationship except this attribute or attribute group belong 完全函数依赖to it, then it is a code. For example (student number, grade), the combination of the two can determine all other attributes, (student number, class name) -> score, student number -> name, student number -> department name, student number- >Department name->Department chair, so (student number, course name) is a code in this relationship, so is there another code in this relationship? Who knows, but our analysis process should be like this: go from one attribute all the way to n attributes.

    1. One attribute: student ID, name, department name, department chair, class name, score. Find a counter-example: the student number cannot determine the grade, (the class name must be required), the name can't determine anything, the department name can only determine the department head, and the class name can't determine anything, the grades are the same. So there must be no code in an attribute.
    2. Two attributes: (Student ID, Name), (Student ID, Department Name), (Student ID, Department Chair), (Student ID, Course Name), (Student ID, Score), (Name, Department Name), ( Name, Department Chair), (Name, Class Name), (Name, Score), (Department Name, Department Chair), (Department Name, Class Name), (Department Name, Score), (Department Chair, Class Name), (Department Chair, Score), (Class Name, Score). Phew~ It's finally over, we will analyze it again and find that only (student number, class name), this combination is completely functionally dependent on other relationships. That is, they are a code.
    3. Three properties:  …
    4. Four properties:  …
    5. Five properties:  …
    6. Six properties:  …

Well, is that all it takes? Yes, that's the only way write picture description here. In fact, there are some ways to reduce the workload. For example, if we have determined that (student number, class name) is when we analyze two attributes , then in the subsequent analysis, if we include (student number , class name) name), it must not be code, because it needs to be completely functionally dependent. For example (student number, course name, department name) does not need to be analyzed. Other methods are summed up by yourself.

  • Main property: All properties contained in the code are main properties.
  • Non-primary attributes: Attributes other than those included in the code.

Huhu~, look back and read it again, we are going to the second paradigm.

5. Second Normal Form (2NF)

On the basis of the first normal form, the partial functional dependence of the non-primary attributes on the primary attributes is eliminated . For our example, the analysis is as follows:
primary attributes: student number, class name
Non- primary attributes: name, department name, department head, score

Currently there is only one table:
(Student ID, Name, Department Name, Department Chair, Course Name, Score)

Obviously, both the name and the department name have partial functional dependencies on (student number, class name). Because in fact, only the student number can be used to determine the name and department name. So this table does not conform to the second normal form at present, we have to decompose the table, of course, the decomposition mode is not unique, the following is just a situation, as shown below:

Course selection (student number, course name, score)
student (student number, name, department name, department head)
write picture description here
Let's look at whether the two tables satisfy the second paradigm:
Course selection table: main attribute: (student number, course name) . Non-primary attribute: score. There is no partial functional dependence of non-primary attributes on primary attributes, which satisfies the second normal form.
Student table: main attribute: student number. Non-primary attributes: Department Name, Department Chair, Name. Because there is only one main attribute, there must be no partial functional dependencies of non-main attributes on the main attribute. Satisfy the second normal form.

As for how to find the main attribute, I don't need to emphasize it. If you forget it, look back at it. At this point the data becomes like this:
write picture description here

Let's go back and see if the previous problem has been improved:

  • Data redundancy problem: The data redundancy of names, department names, and department chairs has been significantly improved.
  • Insertion exception problem: Now a new department is opened, but the department name cannot be inserted into the student table yet, because the student number is the main attribute and cannot be empty. no improvement.
  • Delete abnormal problem: a department graduates, delete all the student information of this department, it will also delete the information of the department. no improvement.

Therefore, it is not enough to only satisfy the second normal form. There are still many problems.

6. Third Normal Form (3NF)

On the basis of the second normal form, the transfer function dependence of the non-primary attribute on the primary attribute is eliminated . Similarly, we continue to analyze our example:
primary attribute: student number, class name
Non- primary attribute: name, department name, department chair, score

Currently our table is as follows:
course selection (student number, course name, score)
student (student number, name, department name, department chair)

We found that in the student table: there is a transfer function dependence of non-primary attributes 系主任on the primary attributes . 学号Because 学号->系名,系名->系主任of this, there are so many problems in front of them. Then we try to decompose the student table again to eliminate this transfer function dependency. It is broken down as follows:

Course Selection (Student ID, Course Name, Score)
Student (Student ID, Name, Department Name)
Department (Department Name, Department Head)
write picture description here

We continue to analyze:
the code in the course selection table is (student number, course name), and the non-primary attributes are scores. They are completely functionally dependent, and there is no transfer function dependency of non-primary attributes on primary attributes. conforms to third normal form.
For the student table, the code is the student number, the main attribute is the student number, and the non-main attribute is the department name. There is no transfer function dependence of the non-main attribute on the code, which conforms to the third normal form.
For the department table, the code is the department name, and the main attribute is the department name. The attribute is the department name, and the non-primary attribute is the department head. It is impossible for the non-primary attribute to have a transfer function dependency on the code (at least three attributes are required to have a transfer function dependency), which also conforms to the third normal form.

The data now looks like this:
write picture description here

Let's go back and see if the previous problem has been improved:

  • Insert exception problem: Now a new department is opened, but we can save the department information because we have the department table. The problem is improved.
  • Delete abnormal problem: a department graduates, delete all the student information of this department, and now the information of the department will not be deleted together, because we have the department table. The problem is improved.

When the database reaches the third normal form, the problem of data redundancy, data insertion, deletion and update is basically solved. This is also the most basic requirement of a "legitimate" database, but the efficiency problem is another matter. , because the more tables, the more join operations, but join is a more resource-intensive operation. For our previous example, it has been optimized to the best, and there is no place to optimize again. Below we will talk about another example when we talk about the BC paradigm.

7. BC Normal Form (BCNF)

On the basis of the third normal form, to eliminate the partial function dependence and transfer function dependence of the main attribute on the main attribute , what? Are you right? Yes, that's right. Let's look at the example below.
If:
1: A company has several warehouses;
2: Each warehouse can only have one administrator, and one administrator can only work in one warehouse;
3: A warehouse can store multiple items, and one item can also be stored. Can be stored in different warehouses. Each item has a corresponding quantity in each warehouse.

As follows:
write picture description here

Attributes are: warehouse, administrator, item, quantity.
Let's find the code first:

  • an attribute: none
  • Two attributes: (admin, item), (warehouse, item)
  • Three attributes: contains two of the attributes, pass
  • Four attributes: contains two of the attributes, pass

We get primary attributes: admin, item, warehouse
non- primary attributes: quantity

There is no partial function dependence and transfer function dependence of quantity on the main attribute, some people may have doubts? Obviously (warehouse, item) -> quantity, why is it not a partial functional dependency, because the main attribute we are talking about here is code, the three main attributes at this time are two codes (administrator, item), (warehouse, item) ) together, and neither one nor the number of them can be partially functionally dependent. Therefore, this table is in full sub-third normal form, but it still has the following problems:

  • Deletion problem: For the last record, (Beijing warehouse, Li Si, iPad Mini, 60), we will never store iPad Mini in this warehouse in the future. When deleting, it can only be deleted together with the warehouse.
  • Insertion problem: If you create a new warehouse and have not deposited items, you cannot assign an administrator to the warehouse.
  • Modification exception: If the administrator of a warehouse is changed, it needs to be modified one by one when modifying.

Since they all satisfy the third paradigm, why are there still so many problems? Because there is a partial functional dependency of the main property on the main property , which is present in this example (管理员,物品名)->仓库, but actually 管理员->仓库. Therefore, there is a partial functional dependence of the main attribute on the main attribute (code).

We decompose its mode:
warehouse (warehouse name, administrator)
inventory (warehouse name, item name, quantity)

Looking back again to see if the original problem has been solved

  • Delete problem: Because the warehouse now has a special table, it will not affect the warehouse when deleting items. The problem is improved.
  • Insertion problem: Now create a new warehouse directly in the warehouse table, which has nothing to do with the administrator. Improved.
  • Modification exception: If the administrator of a warehouse is modified, the warehouse table can be modified directly. Improved.

BCNF-compliant databases are already very strict. But this strictness is only at the 函数依赖level. What? Ascend level? No, right! The next area of ​​Fourth Paradigm research is the 多值依赖level.

8. Fourth Normal Form (4NF)

So let's get straight to the point and look at an example:

Teaching(C, T, B), C is the course, T is the teacher, and B is the reference book.
There are the following unnormalized relationships:
write picture description here

We first make it satisfy 1NF as follows:
write picture description here

Numerous problems were found:

  • Information redundancy: There is a lot of data redundancy in courses and teachers.
  • Insertion problem: When a class teacher is added to a course, multiple tuples need to be inserted.
  • Deletion problem: If you delete a book, you need to delete many records, which is very troublesome.

Let's see which normal form it belongs to. It is found that C, T, and B in this relationship are all unique, that is, full keys. There is no non-primary attribute, and there is no partial functional dependency and transfer function dependency of the primary attribute on the primary attribute (because at least two primary attributes are required in the code), it has only one code (C, T, B). That is, this relationship belongs to BCNF. So why does it have so many problems? It is because it has multi-valued dependencies.

Multi-valued dependencies: In the relational schema R(U), X, Y, Z are subsets of U, and Z=UXY. If the multi-valued dependency X->->Z holds, then given (x, y), a set of values ​​of z can be determined, and this value is determined only by x and has nothing to do with y.

For example: in Teaching, C, B, T are subsets of Teaching, T=Teaching-CB. Given (C, B), for example (Database Principles and Applications, SQL Server 2000), a set of T values ​​can be determined (Deng Yu, Sun Ze). But this set of T values ​​is only related to C (database principle and application), and has nothing to do with B (SQL Server 2000), so we say C->->T. is called T multivalued depending on C.

Trivial multivalued dependency: if y is the empty set. X->->Z is a trivial multi-valued dependency
Non- trivial multi-valued dependency: y is not empty.

In fact, the analysis has been completed, because there are multi-value dependencies in the Teaching relationship, so the problem we are talking about, we decompose the pattern, and the decomposed table is as follows:

TC table (course, teacher)
BC table (course, reference book)

See if our problem is solved:

  • Information redundancy: improved.
  • Insert question: To add a teacher to a course, just add a record in the TC table. improve
  • Delete problem: To delete a book, you only need to delete a record in the BC table. improve

The current table satisfies the fourth normal form. Having said so much, we finally give the definition of the fourth normal form: on the premise that the relational schema R satisfies 1NF, if every non-trivial multivalue in R depends on X->->Y , X all contain keys, then R belongs to 4NF , in the relationship discussed earlier. The keys are (T, C, B) and are full keys. C->->B, C->->T, this multi-valued dependency is non-trivial, but C alone is not a key, so it cannot satisfy the fourth normal form.

Summary: Functional dependencies and multivalued dependencies are two important data dependencies. If only functional dependencies are considered, BCNF is already at a very high level, and in addition to multi-valued dependencies and connection dependencies, connection dependencies belong to the category of 5NF. It can also be seen from the three different examples in this article that database optimization is good for different databases to reach the corresponding level, and a simple database reaches the third normal form is already very perfect. And adding paradigms means decomposing, and decomposing means that queries may be joined later, which in turn affects efficiency. Finally, the famous sixteen-character classic of database optimization is given: from low to high, gradually standardize, weigh the pros and cons, and stop is enoughwrite picture description here

If you see this seriously, I think you must have gained something, then give it a thumbs up.

Reference: Teacher Liu Wei

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324737551&siteId=291194637