Why I do not like database paradigm [three] Huawei cloud technology sharing

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/devcloud/article/details/99317716

episode

Recently, a distant relative of little cousins plan to take a professional
to find me and asked:

1  ' brother, now learning database has no future Arab? " 
2  
3  " Of course there are, ah, promising much of it " 
4  " that I am beginning to learn the database, you need to start with what started yet? " 
5  
6  " courses, then, first understand the next three paradigms database, SQL these bar " 
7  
8  " SQL I probably know what database three paradigms? " 
9  " Ah ... three paradigm is the table's primary key uniqueness ... those things right, ah ... it should be is that " 
10  
11  " What is a primary key? " 
12  
13  " amount ..... cousin you do not ask me any, good to look at Baidu line does not. " 
14  
15  " Oh .... "

After hanging up the phone, I sigh, as they have almost no exposure to remember three paradigms of this indisputable fact, I quietly opened the Google ....

The concept of three database paradigm, I believe most people are not unfamiliar, from the muddle of college textbooks has spread to the (remember correctly, there should be introduction of this material in a database system).
Remember that would just start looking for internship, because of his ability is too small, even do not know how to write resumes, especially good at the technical part is a blank.
Ever since recruited will resume next few tyrants learn to do reference, had never been seen on everyone's resume will hehe says:

Three master database paradigm, proficient in database systems development language.

Or is:
familiar ER diagram creation tool that enables database design to meet the three paradigms

Three began to feel a database paradigm is indeed a good thing that the interview questions to technical details of three no official paradigm that he felt surprised and at a loss.
With the gradual known work experience, database paradigm theory strong impression in my mind gradually eliminated. I think, either the memory of a recession, or that some principles have formed the instinctive experience.

So, what is the paradigm of the database?

The definition of three paradigms

Here, do not want to spend too much space to discuss the theoretical things, the information is a dime a dozen. We have some simple examples to taste.

1. First Normal Form

Suppose a user information table, in addition to the above user ID, name, address information is also recorded:

Numbering Full name gender location
0001 Joe Smith male Shenzhen City, Guangdong Province
0002 John Doe Female Hainan, Haikou

: In it, a column address information that is not consistent with the first paradigm (1NF) of the
first normal form (1NF): each column of the database table are indivisible atom entry

Therefore, it should be split into:

Numbering Full name gender Whereabouts Ministry city
0001 Joe Smith male Guangdong Province Shenzhen
0002 John Doe Female Hainan Haikou

2. Second Normal Form

Orders to a table, for example, usually produce order contains multiple items in a single Taobao down as follows:

order number Product number product name price
O1 g1 Laundry detergent 23
O1 g2 hair dryer 125
O1 g3 broad bean 5
o2 g9 quilt 302
o2 g8 pillow 69

Here again violates defining a second paradigm:
a second paradigm (2NF): Each table must have one and only one data element of the primary key (Primary key), other attributes need to rely entirely on the primary key

The second paradigm needs to be based on satisfying the first paradigm

The second paradigm first requirement is the existence of a unique primary key in the table above, it must be the order number, item number as a joint primary key to meet the requirements .
So for the second point ask for it? If other properties depend on the primary key?
In order scenario, we can say that this be justified because commodity prices even names may change, but the information in each order should be seen in the same,
who do not want to see product information into their own orders already paid the price suddenly .. What is more important is the total price of the order to maintain consistency with the commodity price records.
So here's record can be considered a snapshot of product information when creating an order.

However, following this scenario may not appropriate:

order number Product number product name price Category
O1 g1 Laundry detergent 23 Household
O1 g2 hair dryer 125 Electric
O1 g3 broad bean 5 food
o2 g9 quilt 302 Household
o2 g8 pillow 69 Household

The type of product is generally fixed, i.e. only the commodity category attributes associated with the product number, which is only dependent on the part of the primary key.
This is a violation of the rules of the second paradigm "other property must be entirely dependent on the primary key", requiring the property to separate the product information table.

3. Third Normal Form

Let us return to the beginning of a user table, if the user information table, adding some information about the city:

Numbering Full name gender city Featured cities The urban population
0001 Joe Smith male Shenzhen Technology and innovation 1300W
0002 John Doe Female Haikou Sightseeing 230W

This will violate the definition of a third paradigm:

The third paradigm (3NF): data of each column in the table and the main key is directly related to, but not directly related

Similarly, the third paradigm also need to be based on the second paradigm

Obviously, here's urban population, features and other attributes depend only on the user's city, not the user, it can only be considered an indirect relationship.
So the best approach is to separate city-related properties to a city information table.

Why paradigm

Database paradigm provides a reference model for database design, development, in a number of educational materials but also as a key course content.
Then the paradigm is proposed to solve the problem?

  • The first paradigm, the requirements will be listed as the smallest division, we want to eliminate a redundant store multiple values in a column behavior
    such as user address information table, split into provincial and municipal fields such clear, independent press the field search, query
  • The second paradigm, requires a unique primary key, and there are partially dependent on the primary key column is desirable to eliminate redundant (redundant) table
    such as commodity classification Orders table, the details of the information, only need a table to store the merchandise information It can be.
  • The third paradigm, there is no requirement indirectly dependent on the primary key column, that still want to eliminate redundant column of the table
    , such as the user table does not need to store additional information which it is the city's population, and other features in the city.

Clearly, a reduction of storage costs in order to eliminate these paradigms are mostly redundant and proposed possible.

PS: You know three paradigms, can help owners save money, no wonder the write on your resume ..

In addition to the three paradigms mentioned in this article, there are essentially BCNF paradigm, fourth, fifth paradigm.

With the concept of three paradigms, you can design a database table structure is very refined. However, the existing project application does not fully comply with the concept of paradigm, because:

  1. For performance reasons, there is no redundancy table design will generate more inquiries behavior, which means more time database IO operation. In some real-time interactive systems, it may slow unbearable.
    Of course, you can use the connection to the database (join) operation, and the fact is, to join the database provided to alleviate this problem. But once used the front sub-library sub-table programs, this problem will be very difficult.

  2. Changes in the cost structure of the database paradigm is proposed in the 20th century, when the disk storage cost is also high. With advances in technology, the cost of data storage has been reduced significantly, the cost for the use of design paradigm (avoid redundancy) caused by reduced income has not been so obvious.

Anti-paradigm design

既然范式是为了消除冗余,那么反范式就是通过增加冗余、聚合的手段来提升性能。比如,为了提升查询的性能,在CMS的文章表中同时冗余作者的信息。
当然,除了冗余(存储多份拷贝) 之外,还有另外的理念,即数据的聚合,或者叫嵌套。这种做法相当于是将多个字段(列)合并存储到数据库表的一个列中。

比如一条订单数据就可以同时包含许多信息:

 

 1 {
 2  "oid": "0001",
 3  "price": {
 4   "total": 380,
 5   "benefit": 40
 6  },
 7 
 8  "goods": [{
 9    "gid": "SN001",
10    "name": "蓝月亮洗衣液",
11    "price": 41,
12    "amount": 2
13   },
14   {
15    "gid": "SN003",
16    "name": "电动剃须刀",
17    "price": 99,
18    "amount": 1
19   }
20  ],
21 
22  "address": {
23   "contact": "张三",
24   "phone": "150899000"
25    ...
26  }
27 ...
28 }


这种灵活的结构几乎是 NoSQL的专利,比如MongoDB文档数据库就可以直接以内嵌数组、对象的形式来实现聚合式存储,这无疑带来了极大的灵活性。

而 MySQL 在5.2.7版本开始支持JSON结构化列,也进入了聚合式存储的队伍,与其对标的PostGreSQL 则是9.4版本就已经支持。

反范式的设计在互联网项目、开源产品中也非常之常见,比如大名鼎鼎的Discuz 的数据表设计中就存在许多的冗余列、聚合字段。
一方面,除了能获得性能的提升之外,数据压缩、高度灵活扩展(非结构化) 也是反范式设计能获得青睐的理由。

当然,这里并非一律反对数据库范式,理解范式仍然是做好数据库设计的一门基础,比如选择合适的主键、清晰的划分每一列属性等等。
在项目中仍然需要根据自身的业务特点在范式和反范式中找到平衡点(通常是两者的结合)。类似于架构设计中空间换时间的一些做法,这其中涉及到的各种取舍都是需要经过权衡的。

也可以说这是一门艺术,因为没有标准答案...

作者:美码师

Guess you like

Origin www.cnblogs.com/huaweicloud/p/11867929.html