What is the database paradigm

Foreword:

Regarding the database paradigm, I have often heard of it, but have never understood it in detail. General database books or database courses will introduce paradigm-related content, and paradigms often appear in database exam questions. It is not clear if you have a clearer understanding of the paradigm? In this article, let's learn the database paradigm together.

1. Introduction to the database paradigm

In order to build a database with less redundancy and a reasonable structure, certain rules must be followed when designing the database. In relational databases, such rules are called paradigms. A paradigm is a summary that meets a certain design requirement. To design a relational database with a reasonable structure, a certain paradigm must be satisfied.

The English name of the paradigm is Normal Form, or NF for short. It was summarized after the British EFCodd proposed the relational database model in the 1970s. Paradigm is the basis of relational database theory, and it is also the rules and guidance methods we must follow in the process of designing database structure.

There are currently six common paradigms for relational databases: First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Bath-Cord Normal Form (BCNF), Fourth Normal Form (4NF) and Fifth Normal Form Paradigm (5NF, also known as perfect paradigm). The paradigm that meets the minimum requirements is the first normal form (1NF). On the basis of the first normal form, the one that further satisfies more specifications is called the second normal form (2NF), and the rest of the normal form can be deduced by analogy.

2. Detailed explanation of common paradigms

When designing a database, it will refer to the requirements of the paradigm, but it does not mean that the higher the level of the paradigm, the better. Although the paradigm is too high, although it has better constraints on the data relationship, it will also lead to more relationships between the tables. It is cumbersome, resulting in more tables per operation and lower database performance. Generally, in relational database design, the highest is BCNF, which is generally 3NF. That is, under normal circumstances, we use the first three paradigm is enough. Let's take a closer look at the first three commonly used paradigms.

First Normal Form (1NF)

The first paradigm is the most basic paradigm. If all the field values ​​in the database table are non-decomposable atomic values, it means that the database table satisfies the first normal form. Simply speaking, the first normal form is that the data in each row is indivisible, and there cannot be multiple values ​​in the same column. If there are duplicate attributes, a new entity needs to be defined.

Example: Suppose a company wants to store the names and contact details of its employees. It creates a table as follows:

What is the database paradigm

Two employees (Jon & Lester) have two mobile phone numbers, so the company stores them in the same table, as shown in the table above. Then the table does not comply with 1NF because the rule says "each attribute of the table must have an atomic (single) value", and the emp_mobile value of Jon & Lester employees violates this rule. In order to make the table comply with 1NF, we should have the following table data:

What is the database paradigm

Second Normal Form (2NF)

The second paradigm is a step beyond the first paradigm. The second paradigm needs to ensure that each column in the database table is related to the primary key, and not only related to a certain part of the primary key (mainly for the combined primary key). That is to say, in a database table, only one type of data can be stored in a table, and multiple types of data cannot be stored in the same database table.

+----------+-------------+-------+
| employee | department  | head  |
+----------+-------------+-------+
| Jones    | Accountint  | Jones |
| Smith    | Engineering | Smith |
| Brown    | Accounting  | Jones |
| Green    | Engineering | Smith |
+----------+-------------+-------+

The above table describes the relationship between the employed person, the work department and the leader. We make the data that can uniquely represent a row of the table in the database the primary key of this table. The head column in the table is not related to the primary key. Therefore, the table does not conform to the second normal form. In order to make the above table conform to the second normal form, it needs to be split into two tables:

-- employee 为主键
+----------+-------------+
| employee | department  |
+----------+-------------+
| Brown    | Accounting  |
| Green    | Engineering |
| Jones    | Accounting  |
| Smith    | Engineering |
+----------+-------------+

-- department 为主键
+-------------+-------+
| department  | head  |
+-------------+-------+
| Accounting  | Jones |
| Engineering | Smith |
+-------------+-------+

Third Normal Form (3NF)

Under the premise of satisfying 2NF, all fields other than the primary key must be independent of each other, that is, it is necessary to ensure that each column of data in the data table is directly related to the primary key, but not indirectly related.

In short, the third normal form (3NF) requires that a relationship does not contain non-primary key information that is already contained in other relationships. For example, there is a department information table, where each department has information such as department number (dept_id), department name, and department profile. Then after the department number is listed in the employee information table, department-related information such as department name and department profile cannot be added to the employee information table. If there is no department information table, it should be constructed according to the third normal form (3NF), otherwise there will be a lot of data redundancy.

3. About anti-paradigm

The advantages of the paradigm are obvious. It avoids a lot of data redundancy, saves storage space, and maintains data consistency. Normalized tables are usually smaller and can be better placed in memory, so operations will be performed faster. So as long as all tables are standardized as 3NF, the database design is optimal? This is not necessarily true. The higher the paradigm, the finer the division of tables, the more tables are needed in a database, and the user has to divide the originally associated data among multiple tables. Slightly more complex query statements may require at least one association on a database that conforms to the paradigm, perhaps more, which is not only expensive, but may also invalidate some indexing strategies.

Therefore, when we design the database, we will not completely follow the paradigm requirements, and sometimes we will also conduct anti-paradigm design. Improve the read performance of the database by adding redundant or repeated data, and reduce the number of join tables in relational queries.

reference:

Guess you like

Origin blog.51cto.com/10814168/2547281