Those things about the database paradigm (transfer)

Recently, I reviewed the database paradigm. I think this article is very easy to understand. Please forward it.

 

http://blog.jobbole.com/92442/

Introduction

      The status of database paradigm in database design has always been ambiguous. In textbooks, academic definitions of database paradigm are given, but the application of paradigm in practical applications is not very optimistic. This article will use simple language and a simple The database DEMO implements a non-paradigm database step by step from the first normal form to the fourth normal form.

 

Paradigm goals

      There are many benefits to applying the database paradigm, but the most important ones boil down to three things:

      1. Reduce data redundancy (this is the main benefit, other benefits are incidental to it)

      2. Eliminate exceptions (insert exceptions, update exceptions, delete exceptions)

      3. Make the data organization more harmonious...

    

       But the sword is double-edged, and applying the database paradigm will also bring disadvantages, which will be discussed later in the article.

 

what is a paradigm

      Simply put, the paradigm is to eliminate duplicate data and reduce redundant data, so that the data in the database can be better organized, and the disk space can be used more efficiently. A standardized standard, the prerequisite for satisfying the high-level paradigm is to meet the low Hierarchical paradigm. (For example, if 2nf is satisfied, 1nf must be satisfied)

 

DEMO

      Let's start with an unnormalized table, which looks like this:

0nf

First make a brief description of the table, employeeId is the employee id, departmentName is the department name, job represents the position, jobDescription is the job description, skill is the employee skill, departmentDescription is the department description, and address is the employee's address.

Perform first normal form (1NF) on a table

    If all attributes of a relational schema R are inseparable basic data items, then R ∈ 1NF.

    Simply put, the first normal form is that every attribute is indivisible. If it does not conform to the first normal form, it cannot be called a relational database. For the above table, it is not difficult to see that the Address can be subdivided. For example, "No. XX, XX District, XX Road, Beijing", which obviously does not conform to the first normal form, and the application of the first normal form to it needs to decompose this attribute into another table. ,as follows:

1nf

Perform second normal form (2NF) on a table

若关系模式R∈1NF,并且每一个非主属性都完全函数依赖于R的码,则R∈2NF

 

简单的说,是表中的属性必须完全依赖于全部主键,而不是部分主键.所以只有一个主键的表如果符合第一范式,那一定是第二范式。这样做的目的是进一步减少插入异常和更新异常。在上表中,departmentDescription是由主键DepartmentName所决定,但却不是由主键EmployeeID决定,所以departmentDescription只依赖于两个主键中的一个,故要departmentDescription对主键是部分依赖,对其应用第二范式如下表:

3nf

对表进行第三范式(3NF)

关系模式R<U,F> 中若不存在这样的码X、属性组Y及非主属性Z(Z  Y), 使得X→Y,Y→Z,成立,则称R<U,F> ∈ 3NF。

 

简单的说,第三范式是为了消除数据库中关键字之间的依赖关系,在上面经过第二范式化的表中,可以看出jobDescription(岗位职责)是由job(岗位)所决定,则jobDescription依赖于job,可以看出这不符合第三范式,对表进行第三范式后的关系图为:

3nf1

上表中,已经不存在数据库属性互相依赖的问题,所以符合第三范式

 

对表进行BC范式(BCNF)

关系模式R<U,F>∈1NF,如果对于R的每个函数依赖X→Y,若Y不属于X,则X必含有候选码,那么R∈BCNF。

 

简单的说,bc范式是在第三范式的基础上的一种特殊情况,既每个表中只有一个候选键(在一个数据库中每行的值都不相同,则可称为候选键),在上面第三范式的noNf表中可以看出,每一个员工的email都是唯一的(难道两个人用同一个email??)则,此表不符合bc范式,对其进行bc范式化后的关系图为:

bcnf

对表进行第四范式(4NF)

The relational schema R<U, F>∈1NF, if for each non-trivial multivalued dependency of R X→→Y(Y  X), X contains candidate keys, then R∈4NF.

Simply put, the fourth normal form is to eliminate multi-valued dependencies in the table, which means that it can reduce the work of maintaining data consistency. For the above bc normalized table, the two possible values ​​for the employee's skill are "C#, sql, javascript" and "C#, UML, Ruby". It can be seen that there are multiple values ​​for this database attribute, which is It may cause the problem of inconsistent database content. For example, the first value is "C#", and the second value is "C#.net". The solution is to put the multi-value attribute into a new table, then the fourth value is "C#.net". The normalized relationship diagram is as follows:

4nf

For the skill table, the possible values ​​are:

4nfdemo

 

Summarize

     In the process of decomposing the database paradigm above, it is not difficult to see that the higher the applied paradigm registration, the more tables. Too many tables can bring many problems:

1 When querying, it is necessary to connect multiple tables, which increases the complexity of the query

2 Multiple tables need to be connected when querying, which reduces database query performance

In the current situation, the cost of disk space is basically negligible, so the problem caused by data redundancy is not a reason to apply the database paradigm.

Therefore, it is not that the higher the applied paradigm, the better, it depends on the actual situation. The third normal form has greatly reduced data redundancy, and has reduced the occurrence of insert exceptions, update exceptions, and delete exceptions. My personal opinion is that the third normal form is sufficient for most cases, and the second normal form is also acceptable in some cases.

 

Since my database research is still in its infancy, I hope that experts will give me some advice if there is any inappropriateness in the above.

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326402068&siteId=291194637