Three database design paradigm and counter-paradigm thinking (rpm)

  A person to grow to the position of project manager, database design principles to understand, although a lot of things are relatively strong theoretical stuff; when we get a new demand, we find out after the demand from start to finish, it Videos flowchart begins -> Case Diagrams -> database design -> the development phase -> coding -> test -> on line project, a project so far even completed.

Here we are only examples of this piece of database design are discussed. Mentioned example, we all know that the first paradigm, the second paradigm, the third paradigm. But we understand the deep meaning of these paradigms do? When using these paradigms, they use what good is it? Here we take these issues together side to side to read the following article.

Paradigm  : English name is Normal Form, which is summed up after the British EFCodd (ancestors relational database) relational database model proposed in the 1970s, the paradigm is the basis of the relationship database theory, but also in our design process database structure as to follow the rules and guidance methods. Database design paradigm is needed to meet database design specifications. Only by understanding the database design paradigm, in order to design efficient, elegant and databases, or it may be a design error in the database currently traced a total of eight kinds of paradigm, followed by:. 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, DKNF, 6NF. Meet the minimum requirements called first paradigm, referred to 1NF. Further satisfy some requirements on the basis of the first paradigm for the second paradigm, referred 2NF. Rest and so on. Commonly used just before the three paradigms, namely: a first paradigm (1NF), a second paradigm (2NF), a third paradigm (3NF). Here's a brief introduction of these three paradigms.

◆  first paradigm (1NF):  emphasized that the atomic columns, i.e., columns not be subdivided into several other columns.
Consider this table: [contact] (name, gender, telephone)
if the actual scenario, there is a contact home phone and business phone, then this table structure design is not reached 1NF. We just need to comply with the 1NF column (telephone) split, namely: [contact] (name, gender, home phone, business phone). 1NF good discrimination, but 2NF and 3NF it is easy to confuse.

◆  a second paradigm (2NF):  First, to satisfy it is 1NF, also need to include two parts: First, the table must have a primary key; the second is not contained in the master key must be entirely dependent on the primary key, and can not rely on a portion of the primary key.
Consider an Order Details: OrderDetail [] (OrderID, ProductID, UnitPrice, Discount , Quantity, ProductName).
Because we know that the order can be ordered in a variety of products, it is not enough to be just a OrderID primary key, the primary key should be (OrderID, ProductID). Apparent from the Discount (discounts), Quantity (amount) is completely dependent (dependent) to the main key (OderID, ProductID), and UnitPrice, ProductName only depends on the ProductID. So OrderDetail table does not meet the 2NF. The design does not meet the 2NF prone redundant data.
[] Table can be split into OrderDetail OrderDetail [] (OrderID, ProductID, Discount, Quantity ) , and [Product] (ProductID, UnitPrice, ProductName) to eliminate the original Orders table UnitPrice, ProductName repeated situation.

◆  third normal form (3NF):  First, 2NF, additional non-primary key column must be directly dependent on the primary key, can not rely on the presence of transfer. I.e. not exist: A case of non-primary key column depends on the non-primary key columns B, non-primary key column B is dependent on the primary key.
Consider an order form] [Order (OrderID, OrderDate, CustomerID, CustomerName , CustomerAddr, CustomerCity) is the primary key (OrderID).
Wherein OrderDate, CustomerID, CustomerName, CustomerAddr, CustomerCity other non-primary key column are fully dependent on the primary key (OrderID), so the compliance 2NF. But the problem is CustomerName, CustomerAddr, CustomerCity is directly dependent on the CustomerID (non-primary key columns), rather than directly dependent on the primary key, which is passed through only dependent on the primary key, so do not meet 3NF.
By resolution of [] [Order Order] (OrderID, OrderDate, CustomerID)] and [Customer (CustomerID, CustomerName, CustomerAddr, CustomerCity ) to achieve 3NF.

Q: How do second and third paradigm paradigm difference?  
The second paradigm: the non-primary key column is dependent on the primary key (including an indirectly on the primary key by a column), if there is dependency is a second paradigm;
third paradigm: the non-primary key column is a directly dependent on the primary key, not the kind by passing dependent relationship. If this is in line with the third paradigm;
Q: What are the benefits paradigm exists?  
Paradigm to avoid data redundancy, reducing database space, reduce trouble maintaining data integrity.

范式再给我们带来的上面的好处时,同时也伴随着一些不好的地方:按照范式的规范设计出来的表,等级越高的范式设计出来的表越多。如第一范式可能设计 出来的表可能只有一张表而已,再按照第二范式去设计这张表时就可能出来两张或更多张表,如果再按第三范式或更高的范式去设计这张表会出现更多比第二范式多 的表。表的数量越多,当我们去查询一些数据,必然要去多表中去查询数据,这样查询的时间要比在一张表中查询中所用的时间要高很多。

也就是说我们所用的范式越高,对数据操作的性能越低。所以我们在利用范式设计表的时候,要根据具体的需求再去权衡是否使用更高范式去设计表。在一般的项目中,我们用的最多也就是第三范式,第三范式也就可以满足我们的项目需求,性能好而且方便管理数据;

当我们的业务所涉及的表非常多,经常会有多表发生关系,并且我们对表的操作要时间上要尽量的快,这时可以考虑我们使用“反范式”。反范式,故名思义,跟范式所要求的正好相反,在反范式的设计模式,我们可以允许适当的数据的冗余,用这个冗余去取操作数据时间的缩短。也就是用空间来换取时间,把数据冗余在多个表中,当查询时可以减少或者是避免表之间的关联;

如我们现在要对一个 学校的课程表进行操作,现在有两张表,一张是学生信息student(a_id,a_name,a_adress,b_id)表,一张是课程表 subject(b_id,b_subject),现在我们需要一个这样的信息,把选择每个课程的的课程名称和学生姓名输出来:

SQL语句为:select  B.b_id,B.b_subject,A_a_name from student A ,subject B;

当上面的数据量不多时,我们这样去查询没有问题;当我们的两张表的数据都是在百万级的时候,我们去查上面的信息, 问题出现了,这个查询动不动就是几百毫秒,甚至更慢,这样的查询效率根本不能满足我们对于网页速度的要求(一般不能超过100毫秒),怎么办?当然要反范式,在课程表里面添加冗余字段——学生姓名,这样我们就可以通过下面的查询达到同样的目的:

SQL语句为:select  b_id,b_subject,a_name from subject B;

将两个查询放在一起查看执行计划,就会发现,第一个查询开销占了92%,而第二个才8%,也就是说,第二个查询比起第一个查询,效率上优化了10倍以上,成果显著啊。

总结:

当我们开始着手一个项目后,范式的应用是这样的变化的:

第三范式数据库的设计—–>当数据量越来越大,达到百万级时,经常要对一些多表数据进行大范围高频率进行操作——->范式数据库的设计———->网站的数据量再持续增长———->范式和反范式的数据库设计

当我们的数据量非常大,目前除了对数据库的设计改动外,还可以通过对数据层进行缓存处理。如现在使用效果显著的Memcached ,一个分布式的缓存系统,我们将数据库信息以实体类的方式和图片文件等保存在Memcached里面,只要是可序列化的数据,经过装箱和拆箱,都可以保存 到Memcached中并随时可以快速的访问到这些对象,Memcached可以解决大量数据的缓存并保持多台Web Server得到的缓存数据是一致的。

如果大家对上面的数据库的范式及反范式有好的见解,欢迎留言一起讨论。


转载来自:http://accpchf.iteye.com/blog/1120765 

发布了20 篇原创文章 · 获赞 0 · 访问量 1万+

Guess you like

Origin blog.csdn.net/u011248560/article/details/49611839