Database design and development review

Database design and development

Entity-Relationship Model

  • ER图

  • weak entity set

  • Convert ER diagram to relationships

Purpose of E/R Model

 The E/R model allows us to sketch database schema designs.[Includes some constraints, but not operations.]
 Designs are pictures called entity-relationship diagrams.
 Later: convert E/R designs to relational DB designs.

Entity Sets

 Entity = "thing" or object
 Entity set = collection of similar entities[similar to a class in object-oriented languages]
 Attribute = property of (the entities of) an entity set [Attributes are simple values, e.g. integers or character strings, not structs, sets, stc.] Attributes are values, not structures, and should be used to express atomicity

E/E Diagrams

 In an entity-relationship diagram:
 Entity set = rectangle
 Attribute = oval, with a line to the rectangle representing its entity set
 ​

Relationships

 A relationship connects two or more entity sets.
 It is represented by a diamond, with lines to each of the entity sets involved.

Relationship Set

 The current "value" of an entity set is the set of entities that belong to it. It means that the value of an entity set is the set of entities that belong to it.
 The "value" of a relationship is a relationship set, a set of tuples with one component for each related entity set. Entity composition. ?

Multiway Relationships

 Sometimes, we need a relationship that connects more than two entity sets.
 Is a ternary relationship a polyvalent relationship?

Many-Many Relationships

 In a many-many relationship, an entity of either set can be connected to many entities of the other set.[a bar sells many beers; a beer is sold by many bars]

Many-One Relationships

 Some binary relationships are many-one from one entity set to another.
 Each entity of the first set is connected to at most one entity of the second set.But an entity of the second set can be connected to zero, one, or many entities of the first set.

One-One Relationships

 In a one-one relationship, each entity of either entity set is related to at most one entity of the other set.

Representing “Multiplicity”

 Show a many-one relationship by an arrow entering the "one " side.[Like a functional dependency?]Functional dependency? ? Which side is pointing to?
 Show a one-one relationship by arrows entering both entity sets.
 Rounded arrow = "exactly one",i.e. (that is) each entity of the first set is related to exactly one entity of the target set. [The hollow arrow represents that there must be one]

That is, a factory must have the best-selling beer, but a beer may not necessarily become the factory's best-selling beer.

Attributes on Relationships

 Sometimes it is useful to attach an attribute to a relationship.
 Think of this attribute as a property of tuples in the relationship set.[Equivalent to the properties of each tuple]

Equivalent Diagrams Without Attributes on Relationships

 Create an entity set representing values of the attribute.
 Make that entity set participate in the relationship.

Originally, price is directly used as an attribute of the sales relationship, but an entity can be added to form a ternary relationship, making price an attribute of the prices entity. The arrow in a multivariate relationship represents the set of all other entities that together determine this particular entity.

Roles

 Sometimes an entity set appears more than once in a relationship.
 Label the edges between the relationship and the entity set with names called roles.

When an entity set appears twice in a relationship, it must be marked with Roles.

Subclasses

 Subclass = special case = fewer entities = more propertites.
 Example: Ales are a kind of beer. Not all beers are ales, but some beers are. Suppose that in addition to all the attributes of beer, ale also has the attribute of color

Ales inherits from beer and has an additional attribute color

E/R Vs. Object-Oriented Subclasses

In OO, objects are in one class only. In contrast, E/R entities have representatives in all subclasses to which they belong.
Rule: if entity e is represented in a subclass, then e is represented in the superclass(and recurively up the tree)

Even the entities in the subclass are also in the superclass.

Keys

A key is a set of attributes for one entity set such that no two entities in this set agree on all the attributes of the key.[It is allowed for two entites to agree on some, but not all, of the key attributes.]主键不能相同
We must designate a key for every set.Must designate a key

Keys in E/R Diagrams

In an Isa hierarchy, only the root entity set has a key, and it must serve as the key for all entities in the hierarchy. [It makes sense considering that the entities in the subclass are also in the parent class, and only the parent class Used by primary key to distinguish all entities]

Weak Entity Sets?

Occasionally, entites of an entity set need "help" to identify them uniquely.
Entity set E is said to be weak if in order to identify entities of E uniquely, we need to follow one or more many-one relationships from E and include the key of the related entities from the connected entity sets. (What does it mean? Should it be judged based on other related entity sets?)

Scenario: Players may have the same name or the same number as other teams, so they need to rely on the names of Teams to identify them (making them unique), so players are weak entities and need to rely on Teams. The rounded arrow is because each player must have a Teams

Weak Entity-Set Rules

A weak entity set has one or more many-one relationships to other(supporting) entity sets.
[Not every many-one relationship need to be supporting. But supporting relationships must have a rounded arrow(entity at the "one" end is guaranteed).]
[It turns out that this sentence refers to total participation constraint]
The many-to-one and rounded arrow relationship indicates that there must be a corresponding one, so use this one.
[The key for a weak entity set is its own underlined attributes and the keys for the supporting entity sets.]

The keys of players are number and name (table)

Design Techniques

Avoid redundancy
limit the use of weak entity sets
Dont use an entity set when an attribute will do

Avoiding Redundancy

Redundancy = saying the same thing in two or more different ways.
Wastes space and (more importantly) encourages inconsistency.[Waste space and cause inconsistency]
Two representations of the same fact become inconsistent if we change one and forget to change the other.
Recall anomalies due to FD's.[???]

As an attribute and as a relationship, it creates redundancy

This design repeats the manufacturer address once for each beer, and if there is no beer from the manufacturer at the moment, the address is lost. 【? ? ? What's the meaning】

Entity Sets Versus Attributes

An entity set should satisfy at least one of the following conditions:
It is more than the name of something; it has at least one nonkey attribute.[There is at least one nonkey attribute]
It is the "many" in a many-one or many-many relationship. [It is the many-to-one side, I guess to avoid redundancy]

Dont Overuse Weak Entity Sets

Beginning database designers often doubt that anything could be a key by itself. They make all entity sets weak, supported by all other entity sets to which they are linked.[这是什么意思???]
In reality, we usually create unique ID's for entity sets.[Often number entities]

When do we need weak Entity Sets?

The usual reason is that there is no global authority capable of creating unique ID's.[There are some things that cannot be made unique by numbering alone]

From E/R Diagrams to Relations

Entity set ->relation; Attributes->attributes.[The entity set becomes a table, and the attributes are still attributes]
Relationships -> relation whose attributes are only:
The keys of the connected entity sets.
Attributes of the relationship itself.[The relationship also becomes a table, and the attributes are the keys of the related entities and their own attributes. ]

Combining Relations

OK to combine into one relation:
The relation for an entity-set E[??]
The relations for many-one relationships of which E is the "many".

Drinker's favorite relationship is a many-to-one relationship, and drinker is many, so it can be merged like this

Risk with Many-Many Relationships

May cause redundancy

Handling Weak Entity Sets

Relation for a weak entity set must include attributes for its complete key(including those belonging to other entity sets), as well as its own, nonkey attributes.[键要完全]
A supporting relationship is redundant and yields no relation(unless it has attributes).[The supporting relationship is redundant and yields no relation(unless it has attributes).

Subclasses: Three Approaches

Three Approaches to representing subclasses and their attributes in a database design

Object-oriented: One relation per subset of subclasses, with all relevant attributes.
[ This approach suggests having one relation (table) for each subset of subclasses, where each relation contains all the relevant attributes for that subset. In this way, each subclass has its own table, and attributes that do not belong to a particular subclass are not included in that table.] Each subclass has its own table, and each table contains properties unique to that class.
Use nulls: One relation;entities have NULL in attributes that dont belong to them,
[In this approach, a single relation (table) is used for all subclasses. Entities belonging to different subclasses may have NULL values in attributes that do not apply to them. This allows for flexibility in representing different subsets of objects within the same table.]所有子类都在一个表中
E/R style: One relation for each subclass:
Key attribute(s)
Attributes of that subclass

What is OO is that the subclass table will contain the same attributes as the parent class table [mainly an inheritance]

ER will only include the primary key of the parent class it wants to use, but other attributes will not be included.

Like OO, if there is no such thing, use NULL.

Homework Library Management System Design

Design Theory for Relational Databases

  • Functional Dependencies

  • Decompositions

  • Normal Forms

Functional Dependencies

X->Y is an assertion about a relation R whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y.
X->Y (x determines y) is in a relationship R. If any two tuples of R are the same on all attributes of X, then they must also be the same on all attributes of Y.
Say"X->Y holds in R."[Does this mean that this holds in R? ? ]
Convention:...,X,Y,Z represent sets of attributes; A,B,C,...represent single attribute
Convention: no set form in sets of attributes,just ABC,NOT{A,B,C}

Use R to represent the relationship (relationship model), and r to represent the specific relationship (with data)

What we are discussing now is the functional dependence on the relationship, which has nothing to do with the specific number, so use R

Any two tuples in R: true for all data

Splitting Right Sides of FD’s

X->A1A2...An holds for R when exactly X->A1,X->A2,...,X->An hold for R.
Example:A->BC is equivalent to A->B and A->C
There is no splitting rule for left sides.
We'll generate express FD's with singleton right sides.[??What does it mean]

FD’s functional dependencies

Keys of Relations

K is a superkey for relation R if K functionally determines all of R.
K is a key for R if K is a superkey but no proper subset of K is superkey.

The superkey determines all other properties. Key is also possible, but a subset of key cannot be a super key.

Where Do Keys Come From?

1. Just assert a key K
The only FD's are K-> A for allattributes A.
2.Assert FD's and deduce the keys bysystematic exploration.

Inferring FD’s[‘s what? ? ]

We are given FD's X1 -> A1, X2 -> A2....Xn, -> An, and we want to know whether an FD Y-> B must hold in any relation that satisfies the given FD's.
[Maybe I just want to know whether Y->B also exists in the relationship that satisfies the known FD]
Example: If A -> B and B-> C hold, surely A -> C holds, even if we don't say so.

Inference Test

To test if Y->B, start by assuming two tuples agree in all attributes of Y.
Use the given FD's to infer that these tuples must also agree in certain other attributes.
If B is one of these attributes, then Y->B is true.
Otherwise, the two tuples, with any forced equalities, form a two-tuple relation that proves Y->B does not follow from the given FD's.[??][Otherwise, the two tuples, with any forced equalities, form a two-tuple relation that proves Y->B does not follow from the given FD's. tuples form a tuple relation that proves that Y->B does not follow the given FD. ]

Closure Test

An easier way to test is to compute the closure of Y, denoted Y+
Basis: Y+ = Y.
Induction: Look for an FD's left side X that is a subset of the current Y+. If the FD is X->A, add A to Y+. 

Finding ALL Implied FD’s

Motivation:"normalization," the process where we break a relation schema into two or more schemas.

The right part of a function dependency can be split, but the left part cannot be split, so the left part may have redundant attributes.

?

Basic idea

Start with given FD's and find all nontrival FD's that follow from the given FD's.[Start with given FD and find all non-trivial FD's]
Nontrival = right side not contained in the left
Restrict to those FD's that involve only attributes of the projected schema.[限制范围]

As long as the closure of X contains A, X->A holds

If F originally contains XY->A

So replace XY->A with X->A

A Few Tricks

No need to compute the closure of the empty set or of the set of all attributes.
If we find X+ = all attributes, so is the closure of any superkey of X.
[Is X a key?]

What is projection? What are yields doing?

A Geometric View of FD’s

Imagine the set of all instances of a particular relation.
That is, all finite sets of tuples that have the proper number of components.
Each instance is a point in this space.

An FD is a Subset of Instances

For each FD X->A, there is a subset of all instances that satisfy the FD
We can represent an FD by a region in the space.
Trival FD = an FD that is represented by the entire space.[A->A]

Representing Sets of FD’s

If each FD is a set of relation instances, then a collection of FD's corresponds to the intersection of those sets.
Intersection = all instances that satisfy all of the FD's

Implication of FD’s

If an FD Y->B follows from X1->A1,...Xn->An, then the region in the space of instances for Y->B must include the intersection of the regions for the FD's Xi->Ai
That is, every instance satisfying all the FD's Xi->Ai surely satisfies Y->B
But an instance could satisfy Y->B, yet not be in this intersection

Relational schema design

Goal of relational shcema design is to avoid anomalies and redundancy
Update anomaly: one occurrence of a fact is changed, but not all occurrence
Deletion anomaly: valid fact is lost when a tuple is deleted

Boyce-Codd Normal FORM

We say a realtion is in BCNF if whenever X->Y is a nontrival FD that holds in R, X is a superkey.
nontrival means Y is not contained in X
a superkey is any superset of a key

Decomposition into BCNF

Given:relation R with FD's F
Look among the given FD's for a BCNF violation X->Y
if any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BDNF.
Compute x+
Not all attributes, or else X is a superkey

Decompose R using X->Y

Replace R by relations with schemas:
R 1 = X + ;
R2 = R-(X+ - X)
Project(投影) given FD's F onto the two new relations

Third Normal Form – Motivation

There is one structure of FD's that cause trouble when we decompose.[?]
AB->C and C->B
There are two keys,{A,B} and {A,C}
C->B is a BCNF violation, so we must decompose into AC,BC.

We cannot enforce FD’s

The problem is that if we use AC and BC as our database schema, we cannot enforce the FD AB->C by checking FD's in these decomposed relations.

3NF

3NF modifies the BCNF condition so we do not have to decompose in this problem situation
An attribute is prime if it is a member of any key.
X->A violates 3NF if and only if X is not a superkey, and also A is not prime.

What 3NF and BCNF give u

There are two important properties of a decomposition:
Lossless Join: it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original
Dependency Preservation: it should be possible to check in the projected relations whether all the given FD's are satisfied.

Testing for a Lossless Join

If we project R onto R1,R2,...,Rk,can wew recover R by rejoining?
Any tuple in R can be recovered from its projected fragments.
So the only question is: when we rejoin, do we ever get back something we didn't have originally?

The Chase Test

Suppose tuple t comes back in the join.
Then t is the join of projections of some tuples of R, one for each Ri of the decomposition.
Can we use the given FD's to show that one of these tuples must be t?
Start by assuming t = abc...
For each i, there is a tuple si of R that has a,b,c,... in the attributes of Ri
si can have any values in other attributes
We'll use the same letter as in t, but with a subscript, for these components.

Summary of the chase

If two rows agree in the left side of a FD, make their right sides agree too.
Always replace a subscripted symbol by the corresponding unsubscripted one, if possilbe.
If we ever get an unscripted row, we know any tuple in the project-join is in the original(the join is lossless).
Otherwise, the final tableau is a counterexample.

3NF synthesis Algorith

We can always construct a decomposition into 3NF relations with a lossless join and dependency preservation.
Need minimal basis for the FD's:
Right sides are single attributes.
No FD can be removed.
No attributes can be removed from a left side.
One relation for each FD in the minimal basis.
[Schema is the union of the left and right sides.]
If no key is contained in an FD, then add one relation whose schema is some key.

Why it works

Preserves dependencies: each FD from a minimal basis is contained in a relation, thus preserved
Lossless Join: use the chase to show that the row for the relation that contains a key can be made all-unsubscripted variables.
3NF:hard part - a property of minimal bases

Full Functional Dependency

Y is 'fully functional dependent' on X ix it is dependent on all of X, not on any part of X.
X->Y
not on any part X' of X,X'->Y
A FD X->Y is a full functional dependency if the removal of any attribute from X means the dependency does not hold anymore.

2NF

R is in 2NF if every nonprime attribute A in R is fully functionally dependent on every key of R.

Data dependent axiomatic system

Armstrong's axiom system

The axiom system of functional dependence is the theoretical basis of the pattern decomposition algorithm. The Armstrong axiom system is an effective and complete axiom system.

Definition 5.11

For the relational pattern R(U,F) that satisfies a set of functional dependencies F, any relationship r is true if the functional dependence X->Y is true (that is, any two tuples s, t in r, if s[X ] = t[X], then s[Y] = t[Y]), then it is said that F logically implies X->Y

Armstrong's axiom system

Assume U is the totality of attributes, and F is a set of functional dependencies on U. Then the relationship pattern R(U,F) is established. For R(U,F), there are the following inference rules:
A1 reflexive law:
If Y is contained in X and U is contained, then X->Y is contained in F
A2 augmenting law:
If X->Y is contained by F, and Z is contained in U, then XZ->YZ is contained by F
A3 transfer law:
If X->Y, Y->Z is entailed by F, then X->Z is entailed by F.

R is a relationship, U is all attributes, and F is all functional dependencies [so the logical implication of F is that F contains]

prove:
A1:
Suppose Y is contained in X and is contained in U
For any two tuples t, s in any relation r of R(U,F):
If t[X] = s[X], since Y is contained in X, then t[Y] = s[Y], so X->Y holds, and the reflexive law holds

A2:
Suppose X->Y is contained by F, and Z is contained in U,
For any two tuples t, s in any relation r of R(U,F):
If t[XZ] = s[XZ], then t[X] = s[X], t[Z] = s[Z], from X->Y, then t[Y] = s[Y] , then there is t[YZ] = s[YZ], so XZ->YZ holds, and the augmenting law holds

A3:
Let X->Y, Y->Z be contained by F
For any two tuples t, s in any relation r of R(U,F):
If t[X] = s[X], from X->Y, then t[Y] = s[Y], from Y->Z, then t[Z] = s[Z], so X->Z is established and the transitive law is established

inference rules

Merge rules:
From X->Y,X->Z, there is X->YZ
Pseudo delivery rules:
From X->Y, WY->Z, there is XW->Z;
Decomposition rules:
From X->Y and Z are included in Y, and X->z

Lemma 1

Through the merging rules and decomposition rules, the necessary and sufficient conditions for the establishment of X->A1A2...Ak are X->Ai (i=1, 2,...k)

It's the same thing together and apart.

Definition 1

In the relational pattern R(U,F), the totality of all functional dependencies logically contained by F is called the closure of F, denoted as F+

Continue to find all functional dependencies based on the existing F [NP problem]

Armstrong's axioms are valid and complete

Validity:
Every functional dependence derived from F according to Armstrong's axioms must be in F+
Completeness:
Every functional dependency in F+ must be deduced from F according to Armstrong's axioms

The set of dependent functions derived from Armstrong's axioms
To prove the completeness of Armstrong's axioms, we must find the set
This problem is NP-complete

Definition 2

Let F be a set of functional dependencies on the attribute set U, X is contained in U, XF+ = {A | Bag

It makes sense to compute the closure of a set of attributes X with respect to a set of functional dependencies F.

X->A and all A’s together are its closure

The closure of a functional dependency set is a functional dependency, and the closure of a property set is a property.

The closure of X can have at most as many attributes as U has, which means there is an upper limit.

The closures of all attributes (one by one, two by two) can be calculated, all functional dependencies can be calculated, and the closure of the functional dependency set F can be calculated.

The closure of XYZ is XYZ, which is U. XYZ determines U, then XYZ is the superkey. If the closure of Start with a single attribute to see if it is a key, and then look at two... [It can ensure the minimum nature of the attribute group, and it can ensure that the obtained attribute group must be a key]

Lemma 2

Let F be a set of functional dependencies on the attribute set U. X and Y are contained in U. The necessary and sufficient condition for X->Y to be derived from F according to Armstrong's axiom is that Y is contained in XF+
The problem of judging whether

Algorithm 1

Find the closure XF+ of the attribute set X (X is contained in U) with respect to the functional dependency set F on U
Input:X,F
Output:XF+
step:
(1) Order X(0) = X,i=0
(2) Find B, where B={A|(V exists)(W exists)(V->W is contained in F and V is contained in X(i) and A is contained in W)}
[Scan all functional dependencies. If the left part of the functional dependency is a subset of the calculated closure, then the right part is also added to the closure]
[The selected function dependencies should also be written into the process]
[Functional dependencies selected in one round can no longer be considered later]
(3)X(i+1) = B ∪ X(i)
(4) X(i+1) = X(i) or X(i+1)=U? If yes, go to (5), otherwise go to (6)
(5) X(i+1) is XF+, and the algorithm terminates
(6) If otherwise i++, return to (2) and continue execution.

Number of times the algorithm loops
Let ai = |

All keys are called candidate keys

Theorem 2

Completeness of Armstrong's axiom system

Prove its converse proposition, that is, if the function dependence X->Y cannot be derived by F according to Armstrong's axiom, then it must not be entailed by F.

If V->W holds, and V is contained in XF+ (representing X->V), then W is contained in XF+

Construct a two-dimensional table r, which must be a relationship on R(U,F)

If X->Y cannot be derived from F from Armstrong's axiom, then Y is not a subset of XF+, then the subset Y' of Y must satisfy that Y' is contained in U-XF+, then If it is not true in r, then X->Y must not be contained by R(U,F)

Definition 3

If G+=F+, it is said that the functional dependency set F covers G (F is the cover of G, G is the cover of F), or F and G are equivalent

Lemma 3

The necessary and sufficient conditions for F+ = G+ are that F is contained in G+ and G is contained in F+

prove:
The necessity is obvious
Adequacy:
If F is contained in G+, then XF+ is contained in XG+;
If any X->Y belongs to F+, then Y is contained in XF+ and is contained in XG+, so X->Y belongs to (G+)+ = G+, so F+ is contained in G+;
In the same way, G+ is contained in F+, so F+=G+

Algorithm for judging the equivalence of two functional dependency sets:
To determine that F is contained in G+, you only need to depend on X->Y in F one by one, and check whether Y belongs to XG+.

Definition 4

If the functional dependency set satisfies the following conditions, then F is called a minimal functional dependency set, also known as the minimum dependency set or minimum coverage.
The right-hand side of any functional dependency in F contains only one attribute
There is no such functional dependence X->A in F such that F and F-{X-A}[G] are equivalent
[To test whether X->A is redundant, it is to see whether the closure of X about G includes A]
[Just to see if removing this functional dependency is equivalent to removing it before, the purpose is to remove redundant functional dependencies]
There is no such functional dependence on X->A in F. X has a proper subset Z such that F-{X->A}∪{Z->A} is equivalent to F
[Make sure the left attribute is minimized]
  • Right attribute is simplified

  • Remove redundant function dependencies

  • Minimize left attribute

Theorem 3

Each functional dependency set is equivalent to a minimal functional dependency set Fm. This Fm is called a minimum dependency set

prove:
Simplify the attributes on the right [Just do it once]
Check the functional dependencies FDi in F one by one: X->A, let G=F-{X->A}<If A is contained in Exported through other function dependencies]
Check the functional dependencies FDi in F one by one: Replace X with Can the closure of the left part of the function dependency contain A] [Removing the Bi can also derive A]
[Always make sure that the last Fm does not change from the last Fm]
What remains in the end must be an equivalent minimum set of dependencies.

If it is found that multiple function dependencies can be removed in one scan, multiple Fm should be obtained

It is easier to express it this way.

Discussion of minimal dependency set

The minimal functional dependency set Fm of F is not necessarily unique. It is related to the order in which attributes in each function dependency set FDi are processed.
If the modified F is the same as the original F, it means that F itself is a minimum dependency set

Discussion of Equivalent Dependency Sets

Two relationship patterns R1(U,F) and R2(U,G). If F and G are equivalent, then the relationship of R1 must be the relationship of R2, and the relationship of R2 must be the relationship of R1.
Therefore, it is allowed to replace F with a dependency set G equivalent to F in R(U,F)

Guess you like

Origin blog.csdn.net/m0_62153438/article/details/134893709