[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 9 Fundamentals of Database Technology

Preface

Since the notes copied to CSDN have invalid styles, I don’t have the energy to completely check and set the styles again. Those with points can download the word, pdf, and Youdao cloud note versions.
It should be noted that the downloaded content is consistent with the content shared in this article. The only difference is the style [for example, the key memory and frequently tested content have colors, font sizes, weights, etc., and the directory structure is more complete. Tables are not pictures, etc.]

Download address of this chapter:
https://download.csdn.net/download/chengsw1993/86261349

If you find that the article has reading differences, abnormal display, etc., please let us know in the comment area so that we can modify it. This should be caused by CSDN's markdown syntax.

Series of articles

Previous article:[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 8 Algorithm Analysis and Design

Next article:[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 10 Network and Information Security

basic concept

DBS: ​​Database System
DBA: Database Administrator

Three-level mode - two-level image

Internal mode: manages how to store physical data, corresponding to specific physical storage files. Basic table files and index files together constitute the internal schema of the database system

Schema: Also known as conceptual schema, it is the basic table we usually use. It divides physical data into tables according to applications and requirements.

External schema: Corresponding to the view level in the database, the table is processed to a certain extent before it is provided to the user.

External schema - schema image: It is the mapping between tables and views, existing between the conceptual level and the external level. If the data in the table is modified, you only need to modify this mapping without modifying the application.

Schema-intra-schema image: It is the mapping between the table and the physical storage of the data. It exists between the conceptual level and the internal level. If the data storage method is modified, you only need to modify this mapping without modifying the application.
Insert image description here

Database Design

Requirements analysis: that is, analyzing the requirements for data storage. The output products include data flow diagrams, data dictionaries, and requirements specifications.

Conceptual structure design: It is to design an E-R diagram, that is, an entity-attribute diagram, which has nothing to do with physical implementation and explains what entities there are and what attributes the entities have.

Logical structure design: Convert the E-R diagram into a relational schema, that is, into actual tables and column attributes in the tables. There are many standardization things to consider here.

Physical design: Generate a physical database based on concepts such as generated tables.
Insert image description here

E-R model

The three elements of the data model: data structure (a collection of object types under study), data operations (a collection of operations that are allowed to be performed on instances of various objects in the database), and data constraints (a set of integrity rules).

E-R model: Entity-relationship model, using ovals to represent attributes (usually none), rectangles to represent entities, and diamonds to represent relationships. The contact types must be marked on both ends of the relationship.

Contact type: one-to-one 1:1, one-to-many 1:N, many-to-many M:N.

Attribute classification: simple attributes and compound attributes (whether attributes can be divided), single-valued attributes and multi-valued attributes (attributes have multiple values), NULL attributes (meaningless), derived attributes (can be generated from other attributes).
Insert image description here

relational model

The relational model is also a commonly used table in the database, including the attributes of the entity, identifying the primary key and foreign key of the entity, as follows:

S(Sno,Sname,SD,Sage,Sex) Student S relationship mode, the attributes are student number, name, department, age and gender
T(Tno,Tname,Age ,Sex) Teacher T relationship model, the attributes are teacher number, name, age and gender
C(Cno,Cname,Pcno) Course C relationship model, the attributes are course number, course name and prerequisite courses No.
SC(Sno.Cno,Grade) Student course selection SC relationship mode, the attributes are student number, course number and grade

Model conversion

The E-R diagram is converted into a relationship model: each entity corresponds to a relationship model; there are three types of relationships:
In a 1:1 relationship, the relationship can be placed at either end In entities, as an attribute (to ensure a 1:1 two-end association);
In a 1:N connection, the connection can be used as a relationship model alone, or 1 can be added to the N end. The primary key of the end entity;
In the relationship between M and N, the relationship must be treated as a separate relationship schema, and its primary key is the joint primary key of the M and N ends.

relational algebra operations

Merge: The result is that all the records in the two tables are merged, and the same records are only displayed once.
Intersection: The result is the same record in the two tables.
Difference: S1-S2, the result is those records that are in the S1 table but not in the S2 table.
Insert image description here

Cartesian product: S1×S2, the result includes all attribute columns of S1 and S2, and each record in S1 is combined with all records in S2 to form one record. , the final attribute column is S1+S2 attribute column, and the number of records is S1 multiplied by the number of S2 records.

Projection: It actually selects a column in a certain relationship pattern based on conditions. The column can also be represented by numbers. Symbol π

Select: It actually selects a record in a certain relationship pattern based on conditions. Symbol σ
Insert image description here

The result of natural join displays all attribute columns, but the same attribute column is only displayed once, showing records with the same attributes and the same values ​​in the two relationship schemas.

There are relationships R and S as shown in the left figure below, and the natural connection result is as shown in the right figure below:
Insert image description here

The relationship between natural connection and Cartesian product: Take the above as an example

The Cartesian product sequence is A B C A C D, and the natural connection is A B C D. Therefore, it is necessary to project to the 1 2 3 6 columns first, and then select the conditions for equal values.
R⋈S = π₁¸₂¸₃¸₆ (σ₁=₄∧₃=₅(R×S) )

∧ means and

Standardization basis

functional dependency

Given an X that can uniquely determine a Y, it is said that X determines Y, or that Y depends on X. The expression method is: X→Y, for example, Y=X*X function.

Functional dependencies can extend the following two rules:

Partial functional dependence: A can determine C, (A, B) can also determine C, and part of (A, B) (i.e. A) can determine C, which is called is a partial functional dependency.

Transitive function dependence: When A and B are not equivalent, A can determine B and B can determine C, then A can determine C, which is a transfer function dependence; if A If it is equivalent to B, there is no transfer and C can be determined directly.
Insert image description here

Keys and constraints

Superkey: A combination of attributes that uniquely identifies this table.

Candidate key: Remove redundant attributes from the super key, and the remaining attributes are candidate keys.

Primary key: Select any candidate key as the primary key.

Foreign key: Primary key in other table.

Primary attributes: The attributes within the candidate key are primary attributes, and other attributes are non-primary attributes.

Entity integrity constraints: that is, primary key constraints. The primary key value cannot be empty or repeated.

Referential integrity constraints: that is, foreign key constraints. The foreign key must be the value of the primary key that already exists in other tables, or be empty.

User-defined integrity constraints: Custom expression constraints, such as setting the age attribute value must be between 0 and 150.

*Paradigm [Test Point]

First normal form 1NF: All attributes cannot be divided into two or more components.

Second Normal Form 2NF: If and only if R is 1NF, and each non-primary attribute completely depends on the primary key (there is no partial dependence), R is 2NF. A typical example is that the candidate key is a single attribute, and it is impossible for a single attribute to have partial functional dependence.

Third normal form 3NF: R is 3NF if and only if R is 2NF, and there are no non-primary attributes in R that transitively depend on candidate keys (at this time, neither There will be some dependencies). The general solution is to split the non-primary attributes of transitive dependencies into a new relationship schema. The essence is that the primary key must directly determine all non-primary attributes and cannot be determined indirectly through non-primary attributes.

BC paradigm BCNF: R belongs to BCNF if and only if the determinant of each dependency in F must contain a candidate code of R. The example is as follows:

Suppose there is a relational pattern R(S,T,J), and the dependency set is F={SJ→T,T→J}.

By drawing a picture, the candidate keys can be found to be (S, T), (S, J).
Insert image description here

Any search for normal form requires first finding the candidate key. The candidate key starts from the in-degree of 0, and the candidate key can traverse the entire relationship.

Database security

transaction management

Features: (operation) atomicity, (data) consistency, (execution) isolation, (data change) durability

Concurrency control

Problems: dirty reads, non-repeatable reads, phantom reads

Level 3 lockdown protocol

X lock is an exclusive lock (exclusive lock). If transaction T adds an X lock to data object A, only T is allowed to read and modify A. No other transaction can add any type of lock to A until T releases the lock on A.

S lock is a shared lock (Share lock). If transaction T adds an S lock to data object A, T is only allowed to read A, but cannot modify A. Other transactions can only add S locks to A (that is, they can read but not modify) until T releases the data on A. S lock.

First-level locking protocol: a transaction must first lock X before modifying data R, and it will not be released until the end of the transaction. The one-level blocking protocol prevents lost modifications and guarantees that transaction T is recoverable. However, there is no guarantee that repeatable reading and "dirty" data will not be read.

Second-level blocking protocol: Based on the first-level blocking protocol, transaction T must first add S lock to data R before reading it, and the S lock can be released after reading. The secondary blocking protocol prevents lost modifications and prevents reading of "dirty" data. However, repeatable reads are not guaranteed.

Three-level blocking protocol: The first-level blocking protocol plus transaction T adds S lock to data R before reading it, and it is not released until the end of the transaction. The three-level blocking protocol prevents lost modifications, reading of "dirty" data and repeated reading of data.

Two-stage locking protocol: All transactions must lock and unlock data items in two stages. The expansion phase means that before reading or writing any data, you must first apply for and obtain a blockade on the data; the shrinkage stage means that after releasing a blockade, the transaction cannot apply for or obtain any other blockade. If all concurrently executing transactions comply with the two-stage blocking protocol, then any concurrent scheduling policy for these transactions is serializable. Transactions that adhere to the two-phase blocking protocol may deadlock.

A small blocking granularity means high concurrency but high overhead; a large blocking granularity means low concurrency but small overhead. It is important to comprehensively balance and take into account different needs to reasonably select the appropriate blocking granularity.
Insert image description here
Insert image description here
Insert image description here

There are 4 isolation levels for database transactions, from low to high:Read uncommitted (read uncommitted), Read committed (read committed), Repeatable read (repeatable read) Read), Serializable(serialization). These four levels can solve problems such as dirty reads, non-repeatable reads, and phantom reads one by one.

  1. Oracle supports 2 transaction isolation levels: READ COMMITED and SERIALIZABLE. Oracle's default transaction isolation level is: READ COMMITED

  2. Mysql supports 4 transaction isolation levels. Mysql’s default transaction isolation level is: REPEATABLE READ
    Insert image description here

Database failure

Insert image description here

database backup

Static dump: Cold backup, which means that no access or modification operations are allowed on the database during the dump;

The advantages are very fast backup method and easy archiving (direct physical copy operation);

The disadvantage is that it can only provide recovery to a certain point in time, and cannot do other work. It cannot restore by table or by user.

Dynamic dump: Hot backup, which allows access and modification operations to the database during the dump, so the dump and user transactions can be executed concurrently;

The advantage is that it can be backed up at the table space or database file level, the database can be used, and recovery can be achieved in seconds;

The disadvantage is that mistakes cannot be made, otherwise the consequences will be serious. If the hot backup fails, almost all the results will be invalid.

Full backup: Back up all data.

Differential backup:Only back up data that has changed since the last full backup.

Incremental backup: Backs up data that has changed since the last backup.

Log file: During transaction processing, the DBMS writes the start of the transaction, the end of the transaction, and every operation of inserting, deleting, and modifying the database into the log file. Once a failure occurs, the recovery subsystem of the DBMS uses the log file to undo the changes made to the database by the transaction and roll back to the initial state of the transaction.

Distributed database

Local databases are located in different physical locations, and a global DBMS is used to network and manage all local databases. This is a distributed database. Its architecture is shown in the figure below:
Insert image description here

Sharding mode

Horizontal sharding: Store horizontal records in the table in different places.

Vertical sharding: Store vertical column values ​​in the table in different places.

Distribution Transparency
Sharding Transparency: Users or applications do not need to know how the logically accessed table is stored in chunks.

Location transparency: The application does not care about changes in the physical location of data storage.

Logical transparency: The user or application does not need to know which data model is used locally.

Replication transparency: The user or application does not care where the replicated data comes from.

database

The data warehouse is a special kind of database that also stores data in the form of a database, but with a different purpose: after a long period of operation in the database, more and more data will be stored in it, which will affect the operating efficiency of the system. For some programs Generally speaking, data from a long time ago is not necessary. Therefore, it can be deleted to reduce data and increase efficiency. Considering that it is a pity to delete these data, these data are generally extracted from the database and saved in another database. Called a data warehouse.

It can be seen that the purpose of the data warehouse is not for application, but for subject-oriented, used for data analysis and integration of different tables. It is relatively stable and generally will not be modified. At the same time, a large number of data will be done at a specific point in time. Inserted to reflect historical changes. Based on the above, the formation process of the data warehouse can be extracted, as shown in the following figure:
Insert image description here

data mining

As can be seen from the above figure, after the data warehouse is formed, it has two functions. One is for data query, analysis, and report generation. Another is to use

Data mining tools mine these historical data, find relationships between data, and discover residual value.

Data mining analysis methods

Correlation analysis: Correlation analysis is mainly used to find the correlation between different events, that is, when one event occurs, another event often occurs.

Sequence analysis: Sequence analysis is mainly used to discover events that occur one after another within a certain time interval. These events constitute a sequence, and the discovered sequence should have universal significance.

Classification analysis: Classification analysis analyzes the characteristics of samples with categories to obtain rules or methods for determining whether samples belong to various categories. During classification analysis, each record is first assigned a tag (a set of categories with different characteristics), that is, the records are classified according to the tag, and then these calibrated records are inspected and the characteristics of these records are described.

Cluster analysis: Cluster analysis is based on the principle of "birds of a feather flock together" to cluster samples without categories into different groups and describe each such group. the process of.

Business Intelligence BI

The Bl system mainly includes four main stages: data preprocessing, establishing data warehouse, data analysis and data presentation. Data preprocessing is the first step in integrating the original data of an enterprise. It includes three processes (ETL process) of data extraction, transformation and loading;

Establishing a data warehouse is the basis for processing massive data;

Data analysis is the key to embodying system intelligence, generally using two major technologies: Online Analytical Processing (OLAP) and data mining. Online analytical processing not only performs data summary/aggregation, but also provides data analysis functions such as slicing, dicing, drill-down, roll-up, and rotation. Users can easily perform multi-dimensional analysis on massive data. The goal of data mining is to mine the knowledge hidden behind the data, establish analytical models through methods such as correlation analysis, clustering and classification, and predict future development trends and problems that enterprises will face;

As massive amounts of data and analysis methods increase, data display mainly ensures the visualization of system analysis results.

denormalization techniques

As can be seen from the previous introduction, normalization operations can prevent insertion exceptions, update, delete exceptions and data redundancy. This purpose is generally achieved by splitting the table through schema decomposition.

However, after the table is split, the above exception is solved, but it is not conducive to querying. Each time a query is performed, many tables may be associated, seriously reducing query efficiency. Therefore, sometimes denormalization technology needs to be used to improve query efficiency.

Technical means include: adding derived redundant columns, adding redundant columns, reorganizing tables, and splitting tables. The main purpose is to increase redundancy and improve query efficiency, which is the reverse operation of normalization operations.

Big Data

Characteristics:Mass, diversification, low value density, rapidity.

The comparison between big data and traditional data is as follows:
Insert image description here

To process big data, an integrated platform is generally used, calledbig data processing system, which has the following characteristics:

Highly scalable, high performance, highly fault-tolerant, supports heterogeneous environments, short analysis latency, easy-to-use and open interfaces, lower cost, backward compatibility.

SQL language

Syntax keywords in SQL language, case-insensitive
Create table create table;
Specify primary key primary key(); Index index, view view; Delete the table drop table; Modify the table alter table;
Specify the foreign key foreign key();


SELECT … FROM table WHERE … GROUP BY … HAVING … ORDER BY … LIMIT …

Guess you like

Origin blog.csdn.net/chengsw1993/article/details/125987932