table of Contents
Transactionality of the database
Hello, everyone! I am a little gray ape, a programmer who can write bugs!
When developing a more formal project, the corresponding database is usually designed according to the needs, and these databases usually need to consider the redundancy and simplicity of the database. The three database paradigms are a rule for the design of relational databases.
What is a paradigm?
When all categories in a relationship are data items that cannot be subdivided, the relationship is normalized. Data items that cannot be subdivided, that is, there are no combined data items and multiple data items. A lower-level relational model can be transformed into a collection of several higher-level paradigm relational models through model decomposition. This process is called standardization. Two-dimensional data table can be divided into 5 levels of paradigm as 1NF, 2NF, 3NF, 4NF, 5NF. The first paradigm meets the minimum requirements, and the fifth paradigm meets the highest requirements. And in database design, we must design and develop databases based on the principles of higher paradigm design standards.
Then, my friends and I will briefly introduce the three commonly used paradigms in databases:
First Normal Form (1NF)
Concept: All elements in the database are inseparable, ensuring the atomicity of the elements
Conceptually, we actually understand it very well. What the first normal form says is that the attribute values in each column are indivisible. An analogy is as follows:
Faculty |
Number of people |
|
Boys |
Girls |
|
Software College |
1652 |
689 |
IT Academy |
1264 |
489 |
Big Data Academy |
653 |
534 |
The attributes in the number column can be divided into the number of boys and girls again, which does not satisfy the "attribute value of each column is inseparable" in the first paradigm of database design.
So for such a design, how to transform the non-standard design into the first normal form of rules?
In fact, it's very simple, you only need to separate the sub-attributes that can be subdivided into columns.
The above table becomes the first normal form design:
Faculty |
Boys |
Girls |
Software College |
1652 |
689 |
IT Academy |
1264 |
489 |
Big Data Academy |
653 |
534 |
Second Normal Form (2NF)
Concept: On the basis of the first normal form, all non-primary attributes are completely dependent on the primary code.
In other words: under the premise of satisfying the first paradigm, ensure that each instance or each row can be uniquely identified,
However, it is worth noting that the relational model that conforms to the second normal form may also have problems such as data redundancy and abnormal update. At the same time, these are also some of the flaws and problems in the second paradigm.
So what kind of data table belongs to the first normal form but not the second normal form?
For example, this data table:
Employee number |
Name |
job title |
project number |
title |
1 |
Zhang San |
programmer |
111 |
C development |
2 |
Li Si |
software designer |
222 |
Java development |
3 |
Wang Wu |
Senior Architect |
333 |
Big data development |
As shown in the table: the name is uniquely determined by the employee number (employee number -> name), the job title is uniquely determined by the employee number (employee number -> job title), but the project name is uniquely determined by the project number (project number -> project name), In this way, the project name cannot be uniquely introduced by the employee number in this table, which does not satisfy the second paradigm of "all non-primary attributes are completely dependent on the primary code"
For such a data table, if you want to convert it to meet the second paradigm, you need to separate the attributes that cannot be uniquely identified into a table. For the above table, it is to separate the project information into a separate table and the employee information into a separate table.
Third Normal Form (3NF)
Concept: The relational model satisfies the second normal form, and all non-primary attributes have no transitive dependence on any candidate keywords.
It can also be said that attributes do not depend on other non-primary attributes, attributes directly depend on the primary key
That is, each attribute has a direct relationship with the primary key instead of an indirect relationship, like: a-->b-->c.
For example, the structure shown in the following table:
student ID |
Name |
age |
gender |
School |
School address |
College phone |
111111 |
Zhang San |
21 |
male |
Software College |
Huiji District |
123456 |
222222 |
Li Si |
22 |
Female |
IT Academy |
Jinshui District |
123455 |
333333 |
Wang Wu |
23 |
male |
Big Data Academy |
Erqi District |
123444 |
The above relationship exists in a table structure such as the above table. Student ID --> Home University --> (School Address, School Phone). We should disassemble it as follows:
(Student number, name, age, gender, college)--(home college, college address, college phone)
So it is satisfied that the attribute directly depends on the primary key
BC paradigm (BCNF)
Conceptually: the BC paradigm is also called the modified third paradigm, which eliminates the dependence of the main attribute on the part of the code and the transfer function on the basis of 3NF
Why is it called the modified third paradigm? This shows that the third paradigm also has certain flaws under certain circumstances. So what are the flaws and how to modify them?
We use an actual data sheet example to illustrate:
For a warehouse management system, there are several warehouses, each warehouse can only have one administrator, and one administrator can only work in one warehouse; a warehouse can store multiple items, and one item can also be stored in In different warehouses. Each item has a corresponding quantity in each warehouse.
Warehouse name |
administrator |
Item name |
Quantity |
Warehouse One |
Zhang San |
Cell phone |
12 |
Warehouse One |
Zhang San |
computer |
20 |
Warehouse No. 2 |
Li Si |
Motor |
35 |
Warehouse No. 2 |
Li Si |
air conditioning |
26 |
So what level of paradigm does the relational warehouse (warehouse name, administrator, item name, quantity) belong to?
Let's analyze it:
Known functional dependency set: warehouse name → administrator, administrator → warehouse name, (warehouse name, item name) → quantity
Code: (administrator, item name), (warehouse name, item name)
Main attributes: warehouse name, administrator, item name
Non-primary attribute: quantity
Because there is no partial function dependence and transfer function dependence of the non-primary attribute pair code. So this relationship model belongs to 3NF.
Now we perform some operations on the above data table:
1. Add a new item "tablet" in the first warehouse,
Then the data we need to enter is (warehouse name, administrator name, item name, quantity), will anyone find out at this time, I store the data in the warehouse, I only need to know which warehouse the item is placed in That's it. Why do I need to enter the administrator name? Does this seem a bit troublesome.
2. Change an administrator "Wang Wu" to the second warehouse,
What we need to do at this time is to modify the attribute of the administrator name in each warehouse No. 2 data to "Wang Wu". Isn't that troublesome?
The reason for this problem: There are partial function dependence and transfer function dependence of the main attribute on the code. (In this example, there is a partial functional dependency of the main attribute [warehouse name] on the code [(manager, item name)].
The solution is to eliminate the dependence of the main attribute on the part of the code and the transfer function on the basis of 3NF.
About to split the above table into two tables, a warehouse item table, a warehouse manager table
Warehouse item list
Warehouse name |
Item name |
Quantity |
Warehouse One |
Cell phone |
12 |
Warehouse One |
computer |
20 |
Warehouse No. 2 |
Motor |
35 |
Warehouse No. 2 |
air conditioning |
26 |
Warehouse Clerk Table
Warehouse name |
Administrator name |
Warehouse One |
Zhang San |
Warehouse No. 2 |
Li Si |
In this way, when the above operations are performed, the emergence of problems is well avoided. Such database design rules belong to the BC paradigm
Transactionality of the database
In addition to the three paradigms of database design, transaction processing is also an important means to ensure data integrity. A transaction is a single unit of work, which can contain multiple operations to complete a complete task. A lock is a restriction on data access in a multi-user environment. Transactions and locks ensure data integrity.
Transaction processing
Submit commit, when all operation steps are completely executed, the transaction is said to be committed.
Rollback rollback, because a certain operation step fails, all steps are not committed, the transaction must be rolled back, that is, back to the state before the transaction was executed.
Transaction ACID properties
The characteristics of transaction processing, each transaction has their common characteristics, called ACID characteristics, namely atomicity (atomicity), consistency (consistency), isolation (Isolation), durability (Durability).
Atomicity: The atomicity of the transaction means that the transaction is treated as a unit of work during the execution of the transaction. A unit of work may include several operation steps, and each operation step must be completed to be considered complete. If one of them is caused for any reason If the step operation fails, all the steps fail, and the previous steps must be rolled back.
Consistency: The consistency of the transaction ensures that the data is in a consistent state. If the system is in a consistent state when the transaction starts, the system should also be in a consistent state when the transaction ends, regardless of whether the transaction succeeds or fails.
Isolation: The isolation of the transaction ensures that any data accessed by the transaction will not be affected by any changes made by other transactions until the transaction is completed.
Persistence: The durability of the transaction ensures that the execution of the added transaction is successful, and the result it produces in the system should be durable.
Well, the three paradigms of database design and the transactional explanation of the database will be shared with you first. There are deficiencies and needs to be corrected. I hope you guys can correct me.
Feel good, remember to like and follow !
Little Grey Ape will accompany you to make progress together!