[Database Theory] Course Notes of "Database Principles"

Database Principles

1. Concept introduction

Database: An organized, low-redundancy, sharable, data collection with high data independence and easy expansion.

Database Management System (DBMS): A layer of data management software that sits between the user and the operating system for organizing, accessing, and maintaining data. It is also sometimes referred to directly as a database system.

Basic functions of DBMS:

Database definition; data access; database operation management; data organization, storage and management; database establishment and maintenance network communication, data conversion, heterogeneous database mutual access.

The three-level abstraction of the database: the view layer, the logical layer, and the physical layer; they correspond to the outer mode, the mode, and the inner mode, respectively.

External Schema: It is the data view of the database user and the data representation related to an application;

A database can have multiple foreign schemas.

Schema: A view at the logical level, which is a description of the logical structure and characteristics of the entire data in the database;

A database has only one schema.

Internal Schema: The description of the physical structure and storage mode of data, which is the representation of data in the database;

A database has only one internal schema.

image-20210104141358485

Data Independence:

Physical independence - the schema and application do not change when the internal schema changes;

Logical Independence - When the schema changes, the outer schema and the application change little or nothing (the application depends on the schema).

The three-level schema structure and the second-level image realize the data independence of the database system.

image-20210104142633810

2. Relational Model

The data model describes three elements

Data structures (data, connections between data, etc.)

Data manipulation (operation type, operation method, etc.)

Constraints (semantics of data, constraints between data, etc.)

The basic process of data modeling: conceptual model - logical model - physical model.

2.1, relational operators:

An abstract language for expressing queries using traditional set operations and specialized relational operations.

image-20210104191752692

Selection relation: The selection operation is an operation performed from the perspective of the row, which is to select a tuple from the relation R that makes the logical expression F true;

Projection relationship: The projection operation is performed from the perspective of the column, but after the projection, not only some columns in the original relationship are canceled, but also some tuples may be canceled (to avoid repeating rows);

Connection relationship:

Equi-join, that is, the operation we use most often with "=";

Natural connection requires that the components to be compared must be the same attribute group, and duplicate attributes are removed from the result;

Semi-join, R and S are naturally joined after retaining only projections to the properties of R

Outer join: When R and S are naturally connected, the unmatched tuples are represented by a null value. There are left outer joins, right outer joins, and full outer joins.

R and S are left outer connected, R and S are naturally connected and retain all the tuple information of R, no match is represented by a null value;

R and S are right outer connected, R and S are naturally connected and retain all tuple information of S, no match is represented by a null value;

R and S are fully outer connected, R and S are naturally connected and retain all the tuple information of R and S. If there is no match, it is represented by a null value.

Division relationship: R÷S means the tuple where R is matched in the row and S is matched in the column.

2.2, set operators

and ∪ \cup , difference -, cross∩ \cap , the Cartesian product

R ∪ \cup S: The result is still an n-order relationship, consisting of tuples that belong to R or S, and the corresponding sql isunion;

The application scenario is generally the union of multiple statements, which is equivalent to executing multiple conditions;

RS: The result is still an n-order relationship, consisting of tuples that belong to R but not S, and the corresponding sql is except; (mysql does not support)

R ∩ \cap S: The result is still an n-order relationship, consisting of tuples that belong to both R and S, and the corresponding sql isintersection; (mysql does not support)

Cartesian product: The Cartesian product of relations R and S is a set of (n+m) columns of tuples, where the first n columns are a tuple of the relation R and the last m columns are a tuple of the relation S. If R has x tuples and S has y tuples, then the Cartesian product of the relation R and S has x*y tuples.

Doing the Cartesian product directly produces many meaningless tuples, and often doing natural joins will get the desired result.

select attribute from table1,table2;(默认就是笛卡尔的方式)
select attribute from table1 cross join table2;

demo:

image-20210104212319762
#2,查询Kim老师的办公地点;
#这里使用了`ρ i instructor`设置别名的方式,
#法一,先使用笛卡尔积,再以部门相同为依据找出匹配部分
π d.building (σ i.name = 'Kim' and i.dept_name = d.dept_name (ρ i instructor ⨯ ρ d department))
#法二,作自然连接,直接得到匹配的数据
π d.building (σ i.name = 'Kim'  (ρ i instructor ⨝ ρ d department))

#4,查询database课程成绩在90分以上的学生信息;
π s.name (σ (c.title = 'database' and t.grade > '90') (ρ t takes ⨝ ρ c course ⨝ ρ s student))

#5,查询没有选修任何课程的学生信息
π b.ID,b.name σ a.ID = b.ID ((ρ a (π ID student) - (π ID takes)) ⨯ (ρ b student))

3. SQL language

SQL (Structured Query Language), the structured query language, is the standard language of relational databases.

Classification

DDL (Data Definition Language): Data Definition Language

create, drop, alter, etc., define relational schemas, attribute domains, integrity constraints, indexes, views, etc.

DML (Data Manipulation Language): Data Manipulation Language

select,insert,delete,update

DCL (Data Control Language): Data Control Language

grant,revoke

Common data types

3.2, other common structures

Views : Views always show the latest data! Whenever a user queries a view, the database engine rebuilds the data by using the view's SQL statement.

create view view_name as sql#调用
select view_name;

Stored procedure : The pre-written SQL statement encapsulates a certain function. It is compiled once when it is created, and subsequent calls are directly executed without compilation, which is very efficient.

create procedure procedure_name(in attribute char(6))
begin
    sql语句;
end;
#调用
call procedure_name(value);

Index : The creation of an index will improve query efficiency because it changes the way of full table scan to column scan. However, too many indexes will also have efficiency problems, and the execution and maintenance of indexes will have resource overhead.

create index index_name on table(attribute);

Trigger : Automatically execute some kind of event. But try not to use it, because: triggers will process a transaction for each row of the table, which has performance risks; triggers are transparent to the application and are easily ignored by developers.

#创建一个索引器,在删除学生后,删除其对应的选课信息
create trigger trigger_delete after delete
on student for each row
begin
   delete from takes where takes.ID not in
   (select ID from student);
end;

4. Database Design

4.1, data table structure

Tuples: rows in a two-dimensional table; attributes: columns in a two-dimensional table.

Surpkey A collection of one or more attributes that uniquely identify a tuple
Candidate Key Superkey without redundant properties
Primary key User selected as candidate key for tuple identity
Foreign key For the current schema, the primary key in another schema.
main attribute the attributes that make up the candidate key

Specifically, in a relation r(id, name), (id, name) can be a super key, and id is a candidate key. There can be multiple candidate keys, but only one primary key.

In the simple case, the candidate key contains only one attribute. In the worst case, all attributes of a relational schema are candidate keys for this relational schema, called full keys.

4.2, relational database logic design

The relational schema consists of five parts and is a quintuple: R(U, D, DOM, F)

The relation name R is the symbolic tuple semantics
U is a set of attributes
D is the domain from which the attributes in the attribute group U come from
DOM is the mapping of attributes to domains
F is a set of data dependencies on the attribute group U

Since D and DOM have little to do with schema design, we mainly regard the relational schema as a triple: R<U,F>. If and only if a relation r on U satisfies F, r is called a relation of relational schema R<U,F>.

Main types of data dependencies:

Functional Dependency (Functional Dependency, abbreviated as FD);

Multi-Valued Dependency (Multi-Valued Dependency, abbreviated as MVD).

4.2.1, functional dependencies

Definition: Let R(U) be a relational schema on an attribute set U, and X and Y are subsets of U. If for any possible relation r of R(U), it is impossible for two tuples in r to have equal attribute values ​​on X, but unequal attribute values ​​on Y, then "X function determines Y" or "Y function depends on X", denoted X→Y. Functional dependencies are ubiquitous in real life.

image-20210106203551768

As mentioned above, trivial functional dependencies must exist, so we only discuss non-trivial functional dependencies.

image-20210106203946764 image-20210106204357543 image-20210106204445791

4.3, Paradigm

A paradigm is a collection of relational schemas that conform to a certain level. A relation in a relational database must meet certain requirements. Different paradigms meet different requirements.

image-20210106204900632

1NF: Each component in the relation is single-valued and fixed, and cannot be subdivided.

For example, the value of an attribute column is fixed, and it cannot have two values ​​at the same time.

2NF: Satisfying 1NF, non-primary attributes are fully functionally dependent on candidate keys. That is, there is no case where the function of the non-main attribute part depends on the code

eg: The following relationship mode: SLC (Sno, Sdept, Sloc, Cno, Grade), Sloc is the residence of the students, and the students of each department live in the same place. The code of SLC is (Sno, Cno).

image-20210106205413933

If 2NF is not satisfied, there will be problems, such as:

Insert exception

If a new student is inserted, but the student has not taken a course, that is, the student has no Cno, the insertion fails because the code value must be given when inserting a tuple.

remove exception

If S4 only took one class C3, and now he no longer chooses this class, after deleting C3, the other information of the whole tuple is also deleted.

Modify complex

If a student takes multiple courses, Sdept, Sloc are stored multiple times. If the lineage is transferred, all related Sdepts and Slocs need to be modified, which complicates the modification.

image-20210106205857395 image-20210106205923585

For example, relationship (student number, name, course number, and teacher); the teacher depends on the course number, and the course number depends on the student number.

Can be changed to: (student number, name), (student number, course number, teacher);

3NF: Satisfy 2NF, eliminating the transitive dependence of non-primary attributes on candidate keys . i.e. no non-primary properties depend on properties other than code

For example, the relationship (student number, department name, department chair); student number -> department name; department name -> department chair; and 3NF requires that only code-dependent situations exist.

Can be changed to: (student number, department name), (department name, department chair)

4.4, Functional Dependency Theory

image-20210106220313873 image-20210106220343114

4.4.1, find the closure of the attribute set X on the functional dependency set F

  1. X ( 0 ) = X , i = 0 X^{(0)}=X,i=0 X(0)=X,i=0
  2. Find B, that is, find the left side that has not been used in F is X ( i ) X^{(i)}X( i ) The functional dependence of the subset is V→W , and find the attribute set B that does not appear in W;
  3. X ( i + 1 ) = B U X ( i ) X^{(i+1)}=BUX^{(i)} X(i+1)=BUX(i)
  4. 判断 X ( i + 1 ) = X ( i ) X^{(i+1)}=X^{(i)} X(i+1)=X( i ) whether it is established;
  5. If equal or X=U, then X is X ( i ) X^{(i)}X( i ) ; the algorithm terminates;
  6. If not, i=i+l, go back to step (2).
image-20201107212624421

4.4.2, Solving the Minimum Functional Dependency Set

A minimal functional dependency set is a functional dependency set without any redundancy.

image-20201107213002825

algorithm:

1. Change all dependencies on the right side to single-value dependencies;

2. Remove the redundant attributes that depend on the left side in F;

3. Remove redundant dependencies: start from the first one such as (X→Y), remove it and solve X + X^+X+ closures, seeX + X^+XWhether Y is included in + , it can be removed if it is included.

image-20201107215535725

4.4.3, judge lossless connection

Known relational schema R(A,B,C,D,E) and functional dependency set, F={A→BC, CD→E, B→D, E→A}. If R is decomposed into R1(A,B,C) and R2(A,D,E)
ask: Is this decomposition a lossless join?

Algorithm to determine whether there is a lossless connection:

Build a two-dimensional table from schema and attributes,

1. If the attribute is an element in the schema, the corresponding position is filled with aij a_{ij}aij, otherwise fill in bij b_{ij}bij

2. According to the relationship in the dependency set, update the values ​​in the two-dimensional table one by one:

First find those rows with the same value corresponding to a certain column, observe the column elements in these rows, if there is aj, set all to aj, otherwise replace it with bij;

Repeat the above operations for all relationships until a row in the two-dimensional table reaches the form of a1a2...aj, which indicates that it is a lossless decomposition, and exits the algorithm.

schema|property A B C D E
R1(ABC) a1 a2 a3 b14 b15
R2 (ADE) a1 b22 b23 a4 a5
schema|property A B C D E
R1(ABC) a1 a2 a3 b14 b15
R2 (ADE) a1 a2 a3 a4 a5

lossless

The solution of the candidate code:

image-20201109153707907

5. Affairs

The concept of transaction: Transaction (Transaction) is an inseparable basic unit of database execution, and is the basic unit of recovery and concurrency control.

Atomicity: A transaction is an indivisible smallest unit, either all executed or not all executed;

Consistency: The result of transaction execution must cause the database to change from one consistent state to another. The simple understanding is that the changes made by a transaction before committing are invisible to other transactions;

Isolation: Transactions executed concurrently cannot interfere with each other, and their impact on the database is the same as when they are executed serially;

Durability: Once a transaction is committed, its updates to the database are durable and are no longer affected by subsequent operations or failures.

image-20210111164017625

Transaction Manager:

Pass messages about transaction actions to the log manager;

pass messages to the buffer manager about when buffers can or must be copied back to disk;

Pass operation messages such as database queries to the query processor.

Recovery Manager:

Recovery Manager is activated when the system crashes;

It examines the log and utilizes the log to restore data if necessary

Log Manager:

Maintenance log, recording all modification operations to the database

Must deal with the buffer manager, because access to disk is through the buffer manager

Buffer manager:

Allocate, manage and reclaim buffers;

Decide when to write the buffer's data back to disk (immediate modification/deferred modification)

Integrity constraints: entity integrity and referential integrity, user-defined integrity.

Entity Integrity: A condition that must be met or agreed upon by data in a database. For example, the main attribute cannot be empty and unique, or the age cannot be negative, etc.

Referential integrity: It is relative to the reference relationship between the two tables.

When the referential integrity constraint and the entity integrity constraint cannot be satisfied at the same time, the entity integrity constraint will be satisfied first.

image-20210111170836984

Guess you like

Origin blog.csdn.net/qq_40589204/article/details/118569601