Database design specifications - three paradigms

We must first understand the "paradigm (NF)" What do you mean. According to the textbook definition of paradigm is "consistent with the level of a certain kind of relationship mode of collection that represents the degree of rationalization of a link between the internal relations of each attribute." Very obscure, right? You can actually put it roughly understood as the level table structure of a data sheet conforms certain design standards . Like a home decoration buy building materials, it is the most environmentally friendly E0 class, followed by E1 class, as well as E2 class and so on. Database paradigm is also divided into 1NF, 2NF, 3NF, BCNF, 4NF, 5NF. In general we design a relational database, taking into account the most BCNF enough. In line with higher-level design paradigm, the paradigm must meet the lower level, such as compliance with 2NF relational schema, must conform to 1NF.

Then we look at each one paradigm to explain, first, first normal form (1NF).

1NF relations in line with the (you can understand the data table. Distinction between "relational model" and "relationship", similar to the difference between object-oriented programming "class" and "object". "Relations" is a "relationship model" example, you can put "relationship" as a table with data, and the "relational model" is the table structure this data table. defined as 1NF: 1NF relations in line with each property can not be divided. table 1 case shown, do not meet the requirements of 1NF.


Table 1

In fact, 1NF is the most basic requirement for all relational databases , you, for example, when creating data tables in a relational database management system (RDBMS) SQL Server, Oracle, MySQL , if the data table design does not meet the most basic requirements, the operation must not be successful. In other words, as long as the RDBMS data that already exists in the table, it must be in line with the 1NF. If we want to show data in the table in the RDBMS, you have to design Table 2 in the form of:


Table 2

But only in line with 1NF design, there will still be data redundancy is too large, abnormal insertion, deletion abnormalities, abnormal modification of the problem, for example, in Table 3 in the design:


table 3

  1. Every student number, name, department name, department data repeated several times. Each line data corresponding to the department also repeated many times - data redundancy is too large
  2. If the school created a new department, but not yet enrolled any students (such as March on the new, but to wait until August before enrollment), then the individual is unable to add the name of the Department of Head and data to the data table to (Note 1) - insert abnormal

    Note 1: according to the three relationships entity integrity integrity constraint, the relationship between the code (Note 2) contains any attributes can not be empty, a combination of all the properties are also It can not be repeated. To meet this requirement, the figure of the table, only the combination of student number and name as the class code, otherwise we can not uniquely distinguish each record.

    Note 2: Code: Relations a property or a combination of certain properties, for distinguishing each tuple (can "tuple" is understood as each record in a table, i.e. each row) .
  3. If all students will be a system-related records are deleted, then all of the data system and the dean was also gone (a system that all students are gone, does not mean that there is no line). - delete abnormal
  4. If Ted transferred to the Faculty of lines, then in order to ensure the consistency of data in the database, the data need to modify the system and the three records dean. - Modify an exception .

Because only in line with the database design 1NF there is this kind of problem, we need to improve design standards, remove the factors that lead to these four issues, so that a higher level of compliance with the paradigm (2NF), which is called the "canonical" .

Second Normal Form (2NF) strictly defined in relation theory presented here is not me (because more groundwork involved), only need to know to 1NF 2NF what improvements can be made. Which improvement is, 2NF 1NF on the basis of the above, the non-primary property to eliminate part of the function code dependent . Next, this sentence in four concepts involved - "functional dependency" , "code" , "non-primary properties" , and "partial function dependent" for what explained.

Functional Dependencies
We can understand (but not particularly strict definition): If in a table, in case the value of property (or property group) X determined, will be able to determine the value of property Y, then it can be said Y function It depends on X, writing the Y → X- . That is, in the data table, any two records do not exist, they are the same value in the X-property (or property group), and different values in the Y attribute. This is the "functional dependency" origin of the name, a function similar to y = f (x), in the case where the value of x is determined, the value of y must be determined.

For example, different for the data in Table 3, one can not find any records, they have the same name and the corresponding number of school. So we can say the name of the function depends on the student number , writing student number → name . However, in turn, may occur because students of the same name, it is possible that two different student records, they have the same values on the name, but a different corresponding student number, so we can not say that learning depends on the number of function names. Other functional dependency table as well as:

  • Head of Department name →
  • Student ID Department →
  • (Student number, course name) → score

However, the following functional dependencies is not true:

  • Student ID → class name
  • Student ID → score
  • Lesson name → Department
  • (Student number, course name) → Name

From the "functional dependency" concept to expand, there will be three concepts:

Fully functional dependency

In a table, if the X → Y, and for any subset of X (if the attribute group X contains more than one attribute, then), X '→ Y is not satisfied, then we call Y to X is fully functional dependency , denoted XF → Y . (F that should be written directly above the arrow, no way to break out ......, the correct wording as Figure 1 )

                                                                         

figure 1

E.g:

  • Student ID F → Name
  • (Student number, course name) F → Score (Note: because the school with a number corresponding to the fraction of uncertainty, with a class name corresponding to the fraction not sure)

Function dependent part

If Y function depends on X, but Y is not entirely dependent on the function of X, then we said Y function depends on the part X, referred to as XP → Y, as in FIG. 2 .

                                                                       

figure 2


E.g:

  • (Student number, course name) P → Name


The transfer function dependent
if Z function depends on Y, and Y is functionally dependent on X (thanks @ Pictet error pointed out, here read: "Y is not included in the X, X and Y are not functionally dependent on" the premise), then we Z is said transfer function depends on X, referred to as XT → Z, as FIG .

                                                                   


image 3

Code:
Let K a property or property group in a table, if all attributes except K are entirely dependent on the function K (the "full" do not leak), then we call K is a candidate code , referred to as code . In practice, we usually can be understood as: if the case where K is determined, the value of the table in addition to all the properties of K along with it OK, then K is the code . A table can have more than one yard. (For convenience of practical application, select a code which is generally used as the master key )

例如:
对于表3,(学号、课名)这个属性组就是码。该表中有且仅有这一个码。(假设所有课没有重名的情况)

非主属性
不包含在任何一个码中的属性为非主属性,反之则为主属性。

例如:
对于表3,主属性就有两个,学号课名


终于可以回过来看2NF了。首先,我们需要判断,表3是否符合2NF的要求?根据2NF的定义,判断的依据实际上就是看数据表中是否存在非主属性对于码的部分函数依赖。若存在,则数据表最高只符合1NF的要求,若不存在,则符合2NF的要求。判断的方法是:

第一步:找出数据表中所有的
第二步:根据第一步所得到的码,找出所有的主属性
第三步:数据表中,除去所有的主属性,剩下的就都是非主属性了。
第四步:查看是否存在非主属性对码的部分函数依赖

对于表3,根据前面所说的四步,我们可以这么做:

第一步:

  1. 查看所有每一单个属性,当它的值确定了,是否剩下的所有属性值都能确定。
  2. 查看所有包含有两个属性的属性组,当它的值确定了,是否剩下的所有属性值都能确定。
  3. ……
  4. 查看所有包含了六个属性,也就是所有属性的属性组,当它的值确定了,是否剩下的所有属性值都能确定。

看起来很麻烦是吧,但是这里有一个诀窍,就是假如A是码,那么所有包含了A的属性组,如(A,B)、(A,C)、(A,B,C)等等,都不是码了(因为作为码的要求里有一个“完全函数依赖”)。

图4表示了表中所有的函数依赖关系:

图4

这一步完成以后,可以得到,表3的码只有一个,就是(学号、课名)

第二步:
主属性有两个:学号 课名


第三步:
非主属性有四个:姓名系名系主任分数


第四步:
对于(学号,课名) → 姓名,有 学号 → 姓名,存在非主属性 姓名 对码(学号,课名)的部分函数依赖。
对于(学号,课名) → 系名,有 学号 → 系名,存在非主属性 系对码(学号,课名)的部分函数依赖。
对于(学号,课名) → 系主任,有 学号 → 系主任,存在非主属性 对码(学号,课名)的部分函数依赖。

所以表3存在非主属性对于码的部分函数依赖,最高只符合1NF的要求,不符合2NF的要求。


为了让表3符合2NF的要求,我们必须消除这些部分函数依赖,只有一个办法,就是将大数据表拆分成两个或者更多个更小的数据表,在拆分的过程中,要达到更高一级范式的要求,这个过程叫做”模式分解“。模式分解的方法不是唯一的,以下是其中一种方法:
选课(学号,课名,分数)
学生(学号,姓名,系名,系主任)

我们先来判断以下,选课表与学生表,是否符合了2NF的要求?

对于选课表,其码是(学号,课名),主属性是学号课名,非主属性是分数学号确定,并不能唯一确定分数课名确定,也不能唯一确定分数,所以不存在非主属性分数对于码(学号,课名)的部分函数依赖,所以此表符合2NF的要求。

对于学生表,其码是学号,主属性是学号,非主属性是姓名、系名系主任,因为码只有一个属性,所以不可能存在非主属性对于码 的部分函数依赖,所以此表符合2NF的要求。

图5表示了模式分解以后的新的函数依赖关系

图5

表4表示了模式分解以后新的数据

表4

(这里还涉及到一个如何进行模式分解才是正确的知识点,先不介绍了)

现在我们来看一下,进行同样的操作,是否还存在着之前的那些问题?

  1. 李小明转系到法律系
    只需要修改一次李小明对应的系的值即可。——有改进
  2. 数据冗余是否减少了?
    学生的姓名、系名与系主任,不再像之前一样重复那么多次了。——有改进
  3. 删除某个系中所有的学生记录
    该系的信息仍然全部丢失。——无改进
  4. 插入一个尚无学生的新系的信息。
    因为学生表的码是学号,不能为空,所以此操作不被允许。——无改进

所以说,仅仅符合2NF的要求,很多情况下还是不够的,而出现问题的原因,在于仍然存在非主属性系主任对于码学号的传递函数依赖。为了能进一步解决这些问题,我们还需要将符合2NF要求的数据表改进为符合3NF的要求。

第三范式(3NF) 3NF在2NF的基础之上,消除了非主属性对于码的传递函数依赖也就是说, 如果存在非主属性对于码的传递函数依赖,则不符合3NF的要求。

接下来我们看看表4中的设计,是否符合3NF的要求。

对于选课表,主码为(学号,课名),主属性为学号课名,非主属性只有一个,为分数,不可能存在传递函数依赖,所以选课表的设计,符合3NF的要求。

对于学生表,主码为学号,主属性为学号,非主属性为姓名系名系主任。因为 学号 → 系名,同时 系名 → 系主任,所以存在非主属性系主任对于码学号的传递函数依赖,所以学生表的设计,不符合3NF的要求。。

为了让数据表设计达到3NF,我们必须进一步进行模式分解为以下形式:
选课(学号,课名,分数)
学生(学号,姓名,系名)
系(系名,系主任)

对于选课表,符合3NF的要求,之前已经分析过了。

对于学生表,码为学号,主属性为学号,非主属性为系名,不可能存在非主属性对于码的传递函数依赖,所以符合3NF的要求。

对于表,码为系名,主属性为系名,非主属性为系主任,不可能存在非主属性对于码的传递函数依赖(至少要有三个属性才可能存在传递函数依赖关系),所以符合3NF的要求。。


新的函数依赖关系如图6


图6

新的数据表如表5


表5


现在我们来看一下,进行同样的操作,是否还存在着之前的那些问题?

  1. 删除某个系中所有的学生记录
    该系的信息不会丢失。——有改进
  2. 插入一个尚无学生的新系的信息。
    因为系表与学生表目前是独立的两张表,所以不影响。——有改进
  3. 数据冗余更加少了。——有改进


结论
由此可见,符合3NF要求的数据库设计,基本上解决了数据冗余过大,插入异常,修改异常,删除异常的问题。当然,在实际中,往往为了性能上或者应对扩展的需要,经常 做到2NF或者1NF,但是作为数据库设计人员,至少应该知道,3NF的要求是怎样的。

============================================================================

BCNF范式

要了解 BCNF 范式,那么先看这样一个问题:

若:

  1. 某公司有若干个仓库;
  2. 每个仓库只能有一名管理员,一名管理员只能在一个仓库中工作;
  3. 一个仓库中可以存放多种物品,一种物品也可以存放在不同的仓库中。每种物品在每个仓库中都有对应的数量。

那么关系模式 仓库(仓库名,管理员,物品名,数量) 属于哪一级范式?

答:已知函数依赖集:仓库名 → 管理员,管理员 → 仓库名,(仓库名,物品名)→ 数量
码:(管理员,物品名),(仓库名,物品名)
主属性:仓库名、管理员、物品名
非主属性:数量
∵ 不存在非主属性对码的部分函数依赖和传递函数依赖。∴ 此关系模式属于3NF。

基于此关系模式的关系(具体的数据)可能如图所示:


好,既然此关系模式已经属于了 3NF,那么这个关系模式是否存在问题呢?我们来看以下几种操作:

  1. 先新增加一个仓库,但尚未存放任何物品,是否可以为该仓库指派管理员?——不可以,因为物品名也是主属性,根据实体完整性的要求,主属性不能为空。
  2. 某仓库被清空后,需要删除所有与这个仓库相关的物品存放记录,会带来什么问题?——仓库本身与管理员的信息也被随之删除了。
  3. 如果某仓库更换了管理员,会带来什么问题?——这个仓库有几条物品存放记录,就要修改多少次管理员信息。

从这里我们可以得出结论,在某些特殊情况下,即使关系模式符合 3NF 的要求,仍然存在着插入异常,修改异常与删除异常的问题,仍然不是 ”好“ 的设计。

造成此问题的原因:存在着主属性对于码的部分函数依赖与传递函数依赖。(在此例中就是存在主属性【仓库名】对于码【(管理员,物品名)】的部分函数依赖。

解决办法就是要在 3NF 的基础上消除主属性对于码的部分与传递函数依赖

仓库(仓库名,管理员)
库存(仓库名,物品名,数量)

这样,之前的插入异常,修改异常与删除异常的问题就被解决了。

以上就是关于 BCNF 的解释。

转:https://www.zhihu.com/question/24696366/answer/29189700

Guess you like

Origin blog.csdn.net/qq_41681241/article/details/95334431