[Database Principles] Relational Database Theory (1)

Database standardization theory.

When faced with a practical problem, such as the requirement to design a teaching management database, how to use the relational model to design a reasonable and appropriate relational database, how to choose a set of better relational patterns, and what attributes each relation consists of, these belong to the database The problem of logical design. Database standardization theory is the theoretical basis of database logic design. The standardization theory of relational database was first proposed by the founder of relational database EFCodd in 1970. Prior to the emergence of this theory, hierarchical and network data models only followed the inherent principles of the model itself. The design and implementation of related data was very random and blind. To put it bluntly, it was a chance. Due to the lack of a theoretical basis for such a database design, many unexpected problems may occur in future operations.
In relational databases, the relational model includes a set of relational patterns, and the relations are not isolated islands. How to design a suitable relational database system, the key is to design the model of the relational database, specifically:

  • What relational patterns should be included in the database
  • What attributes should each relationship model include
  • How to build a complete relational database from these related relational patterns

The normalization theory of relational database mainly includes three aspects: functional dependence, normal form and pattern design. Among them, functional dependency plays a central role, and is the basis of pattern decomposition and pattern design. Paradigm is the standard of pattern decomposition.

An unreasonable relationship model.

We require the design of a teaching management database whose relational model SCD is as follows:
Insert picture description here
A database established based on such a relational model has the following problems:

  • [Data redundancy] The storage times of each department name and the name of the department head are equal to the sum of the number of courses selected by each student. It is not difficult to find that the names, student numbers, and ages of students are also stored repeatedly. The redundancy is considerable.
  • [Insert exception] If the school suddenly creates a new department, such as the emerging artificial intelligence department, the department name and department head cannot be inserted into the table when the department has not yet enrolled students. Because in this relational mode, (SNo+CNo) is the main code, when the main code is empty, the insert operation cannot be performed.
  • [Delete anomaly] When all students in a certain department graduate and the new semester has not yet begun to enroll students, all the information about the department in the data sheet will be deleted, as if the department has been cancelled, but in reality this The department still exists.
  • [Update Difficulty] If a department changes the dean, then the attributes of the dean in the information of all students of the department need to be changed.

It can be seen that in the actual application of SCD, facing the number of 5 or even 6 students, it is almost impossible to do so. This also shows that it is an inappropriate relationship model. The reason for these problems is intuitively that SCD is too comprehensive and wants to contain all the information. The fundamental reason is that there is a data-dependent relationship between attributes.
We call a relational model such as SCD a pan-model . It uses a large table to store all data. For some queries, the large table can directly give results, but as mentioned earlier, various data are involved. Caused many inconveniences and even abnormalities.
If we decompose SCD into the following three patterns, S (SNo, SN, Age, Dept), SC (SNo, CNo, Score) and D (Dept, MN):
Insert picture description here
these three relational patterns are in a certain degree The above realizes the separation of data. The S table is used as a student relationship table to store the basic information of students, which has nothing to do with the selected courses and department heads. The D table is used as a department relationship table to store department information and has nothing to do with students. Course selection relationship, storage of student ID, selected course ID and score, has nothing to do with other information in the student table and the information in the department table. In this way, data redundancy is significantly reduced, and anomalies are also eliminated. To create a new artificial intelligence department, you only need to change it in the D list; all the graduate records are deleted, and the department does not seem to disappear. From the above example, we conclude that a reasonable and appropriate relational model in a relational database should meet the following conditions:

  • Minimal data redundancy
  • No exception is inserted
  • No exception was deleted
  • No update exception

After decomposing the all-encompassing pan-pattern into several relational patterns, the structure of each pattern can be made clear and concise. But we need to note that a suitable relationship model is not absolutely the best in all situations. For example, if we want to query the name of a student's elective course and the name of the department head of the department, we need to complete the query through connection. The overhead required for the connection operation is very large, but the query in the pan mode does not need to be connected (because it contains all this information).
Design the relationship model according to a certain standard, decompose the complex relationship into multiple simple relationships, and then convert the non-standard database model into a standard database model. This is the normalization of the relationship.

Guess you like

Origin blog.csdn.net/weixin_44246009/article/details/108077983