Table of contents
1 Introduction
The three major paradigms are the specifications and guidance methods followed by Mysql database design table structure. The purpose is to reduce redundancy and establish a reasonably structured database, thereby improving the performance of data storage and use.
There is a dependency relationship between the three major paradigms. For example, the second paradigm is built on the basis of the first paradigm, and the third paradigm is built on the basis of the second paradigm.
Of course, the paradigm of Mysql database is more than three major paradigms. In addition to the three major paradigms, there are also Buss-Code Normal Form (BCNF), Fourth Normal Form (4NF), and Fifth Normal Form (5NF, also known as "Perfect Normal Form").
In this article, we only introduce the three commonly used paradigms.
Although following the paradigm can make our database structure more reasonable, it is not static. Occasionally we have to learn to make corresponding changes based on the paradigm and based on actual application scenarios.
2. First normal form - 1NF
Follow atomicity. That is, the data in the fields in the table cannot be split .
Let’s first look at a table structure that does not conform to the first normal form, as follows:
Employee code | Name | age |
---|---|---|
001 | Sales Department Xiao Zhang | 28 |
002 | Xiao Huang from the Operations Department | 25 |
003 | Xiao Gao from the Technical Department | 22 |
In this table, the data under the name field can be split, so it does not conform to the first normal form. So how can it conform to the first normal form? as follows:
Employee code | department | Name | age |
---|---|---|---|
001 | Sales | Xiao Zhang | 28 |
002 | Operations | Xiao Huang | 25 |
003 | Technology Department | Xiao Gao | 22 |
So is following the first paradigm necessarily good? as follows:
Employee code | Name | address |
---|---|---|
001 | Xiao Zhang | Donghu District, Nanchang City, Jiangxi Province |
002 | Xiao Huang | Chancheng District, Foshan City, Guangdong Province |
003 | Xiao Gao | Xinzhou District, Wuhan City, Hubei Province |
By observing the above table structure, we found that the address can be further split, such as:
Employee code | Name | Province | city | district |
---|---|---|---|---|
001 | Xiao Zhang | Jiangxi Province | Nanchang City | Donghu District |
002 | Xiao Huang | Guangdong Province | Foshan City | Chancheng District |
003 | Xiao Gao | hubei province | Wuhan | Xinzhou District |
Although after splitting, it seems to be more in line with the first paradigm, but what if the project only requires us to output a complete address? It is obvious that the table will be more useful when it is not split.
So the paradigm just gives us a reference, and we need to design the table structure more based on the actual situation of the project.
3. Second normal form - 2NF
When satisfying the first normal form, follow uniqueness and eliminate some dependencies. That is, any primary key or any group of joint primary keys in the table can determine all non-primary key values except the primary key.
To put it more simply, a table can only describe one thing .
Let’s analyze it with a classic case.
student ID | Name | age | Course Title | score | credit |
---|---|---|---|---|---|
001 | Xiao Zhang | 28 | Chinese | 90 | 3 |
001 | Xiao Zhang | 28 | math | 90 | 2 |
002 | Xiao Huang | 25 | Chinese | 90 | 3 |
002 | Xiao Huang | 25 | Chinese | 90 | 3 |
003 | Xiao Gao | 22 | math | 90 | 2 |
Let's first analyze the table structure.
1. Assuming that the student ID is the only primary key in the table, the name and age can be determined by the student ID, but the course name and grades cannot be determined.
2. Assuming that the course name is the only primary key in the table, the credits can be determined by the course name, but the name, age and grades cannot be determined.
3. Although all non-primary key values except the joint primary key can be determined through the joint primary key of student number and course name, based on the above two assumptions, it does not meet the requirements of the second paradigm.
So how should we adjust the table structure so that it can meet the requirements of the second normal form?
We can split it into three tables based on the above three primary key possibilities to ensure that one table only describes one thing .
1. Student table - student number as primary key
student ID | Name | age |
---|---|---|
001 | Xiao Zhang | 28 |
002 | Xiao Huang | 25 |
003 | Xiao Gao | 22 |
2. Course schedule - course name as primary key
Course Title | credit |
---|---|
Chinese | 3 |
math | 2 |
3. Score table - student number and course name as joint primary key
student ID | Course Title | score |
---|---|---|
001 | Chinese | 90 |
001 | math | 90 |
002 | Chinese | 90 |
002 | Chinese | 90 |
003 | math | 90 |
At this time we may be thinking, why should we follow the second paradigm? What are the consequences of not following the second paradigm ?
1. Causes data redundancy in the entire table.
For example, in the student table, I may only have 2 students, and each student has a lot of information, such as age, gender, height, address... If it is put in the same table as the course information, each student may With 3 courses, the total number of data items will become 6. But by splitting, we only need to store 2 pieces of student information in the student table, only 3 pieces of course information in the course table, and only need to retain the student ID, course name and grade fields in the grade table.
2. It is inconvenient to update data.
Suppose that the credits of a course change, then we need to update the entire table of credits for the course. But if we split the course schedule, then we only need to update the course information in the course schedule.
3. Inserting data is inconvenient or abnormal.
① Assume that the primary key is the student number or course name. We have added a new course and need to insert data into the table. At this time, only some people may have taken this course. Then we need to specify which ones should be assigned when inserting data. People insert the corresponding course information. At the same time, maybe because the grades are not yet available, we need to leave the grades blank, and we will have to update them again when there are grades later.
② Assume that the primary key is the joint primary key of student number and course name. A new course has also been added, but no one has taken this course yet. The lack of student number primary key field data will cause the course information to be unable to be inserted.
4. Third normal form - 3NF
Eliminate transitive dependencies while satisfying the second normal form. That is, when any primary key can determine the values of all non-primary key fields, there cannot be a non-primary key field A that can obtain a non-primary key field B.
Still use a classic example to analyze
student ID | Name | class | head teacher |
---|---|---|---|
001 | Xiao Huang | First grade (1) class | Teacher Gao |
In this table, the student number is the primary key. It can uniquely determine the name, class, and class teacher, which conforms to the second normal form. However, in non-primary key fields, we can also deduce the class teacher of the class through the class, so it does not conform to the second normal form. Three paradigms.
So how to design the table structure so that it conforms to the third paradigm?
1. Student table
student ID | Name | class |
---|---|---|
001 | Xiao Huang | First grade (1) class |
2. Class table
class | head teacher |
---|---|
First grade (1) class | Teacher Gao |
By making another mapping table between the class and the class teacher, we successfully eliminated the transitive dependency in the table.
Summarize
I wonder if readers have discovered that the ultimate purpose of the paradigms introduced above is to reduce our workload? Therefore, although the paradigm is a good guiding specification, in practical applications, we do not need to be too limited to the paradigm. Instead, we should start from the project and design a reasonable table structure.
The following is a brief summary of the three paradigms in this article:
- First Normal Form (1 NF): Fields cannot be split anymore.
- Second normal form (2 NF): Any primary key or any set of joint primary keys in the table can determine all non-primary key values except the primary key.
- Third normal form (3 NF): When any primary key can determine the values of all non-primary key fields, there cannot be a non-primary key field A that can obtain a non-primary key field B.
Reprinted from https://zhuanlan.zhihu.com/p/590135927