National Computer Rank Examination Level 3 Database Technology (3)

Chapter 3_Database Structure Design

test analysis

In the exam, under normal circumstances, it will appear in multiple choice questions and design questions (ER diagram representation, relational model transformation).
Commonly tested knowledge points are:
1. Master the modeling methods of ER and IDEF1X series in data modeling methods.
2. Master the logical design of the database - the conversion method from the ER diagram to the relational model
3. Master the physical structure, index classification, and index establishment principles in the physical design of the database.

Test point 1: Database conceptual design

Database conceptual design mainly addresses data requirements, that is, how to accurately understand data requirements, truly organize, define and describe clearly the data to be processed in the application field, so as to support the work in the subsequent stages of database design,

1. Objectives of the database conceptual design phase
Define and describe the data scope of the application domain design
Obtain the information model of the application domain or problem domain
Describe the attribute characteristics of the data
Describe the relationship between the data
Define and describe the constraints of the data Explain
the security requirements of the data
Various data processing needs of users
Ensure that the information model can be transformed into the logical structure of the database (that is, the database schema), and it is also easy for users to understand.

2. Basis and process of conceptual design

Basis:
  Documents in the requirements analysis phase, including requirements specifications, functional models (data flow diagrams or IDEF0 diagrams), and various reports in the application domain or problem domain collected during the requirements phase.

Process:
(1) Clarify the modeling goal (model coverage)
(2) Define the entity set (identify and define the entity set from the bottom up)
(3) Define the relationship (association relationship between entities)
(4) Establish information model (construct ER Model)
(5) Determine entity set attributes (attributes describe the characteristics or properties of an entity set)
(6) Integrate and optimize information models (check and eliminate naming inconsistencies, structure inconsistencies, etc.)
  Conceptual design is the core link of DB design. A conceptual data model is an abstraction and simulation of the real world.

3. Data modeling method

The ER model intuitively abstracts the attributes, characteristics and relationships of objective objects in the real world with simple graphics.

The common characteristics of data modeling methods are:
1. It can truly and objectively describe the data in the real world and the relationship between the data.
2. There are few concepts that make up the model, the semantics are clear, and it is easy to understand.
3. The semantics of different concepts do not overlap, and the concepts have no ambiguity.
4. Describe the data in a graphical way, the data is intuitive and easy to understand, which is conducive to the communication between database designers and users.
5. This data model is easily converted into a data structure in the database logic design stage.

ER modeling method:
The entity relationship (ER) method is oriented to modeling data storage requirements, and abstracts the data that needs to be processed in the real world into a certain information structure. This structure does not depend on a specific computer system, and only describes the attributes and characteristics of data and the relationship between data from storage requirements.

Basic concepts related to ER model:
Entity or instance (Instance)
  The things that exist objectively and can be distinguished from each other are called entities.
  Such as student Zhang San, worker Li Si, computer department, introduction to database.
Entity Set (Entity Set)
  A collection of entities of the same type is called an entity set.
  Like all students.
Attribute (Attribute)
  A certain characteristic of an entity. An entity can be characterized by several attributes. The range of values ​​for each attribute is called a domain.
  For example, a student can be composed of student number, name, age, department, grade, etc.
Key (Key):
  An attribute or combination of attributes that uniquely identifies each entity in an entity set.
  The key used to distinguish different entities in the same entity set is called the primary key.
  Any two entities in an entity set cannot have the same primary key value.
  For example, the student number is the main code of the student entity.
Relationships
  describe the interrelationships between entities.
  Such as the teaching relationship between students and teachers, there is a monitor relationship between students and students.
  Links can also have attributes, for example, there is a course selection link between a student and a course, and each course selection link has a grade as its attribute.
  A collection of similar relationships is called a relationship set.

实体间的联系有三类:

The number of connections between entities, that is, the number of entities that an entity can be associated with another entity set through a relationship set.
One-to-one relationship (1:1)
  such as: "department" and "department director" (a department has only one department director, and one department director is only responsible for managing one department) one-to-many
relationship (1:n)
  such as: "department"
Many-to-many relationship (m:n) with "students" (a department enrolls several students, and a student belongs to only one department)
  such as: "students" and "courses" (a student can take multiple courses, and each course can selected by multiple students)

Representation of ER model
ER diagram example

IDEF1X modeling method

IDEFX focuses on the analysis, abstraction and generalization of data requirements in the application domain, known as the data modeling method.
The modeling elements of IDEF1X: entity set, connection.

Entity Set
Independent Identifier Entity Set or Independent Entity Set: Each instance of an entity set can be uniquely identified independent of its relationship to other entity sets.
Dependent Identifier Entity Set or Dependent Entity Set: The uniqueness of an instance of an entity set depends on the relationship of the entity set to other entity sets.
a. Independent Entity Set/Number, b. Dependent Entity Set/Number
In IDEF1X, each entity set definition has a unique name and code, and a slash (/) is used between the name and code to
write above the rectangular box, and the code should be a positive integer.

Relationship
A "deterministic connection relationship" (or simply "parent-child relationship" or "dependency relationship") is a connection or relationship between entity sets. In this connection relationship, each instance of the so-called parent entity set is connected to 0, 1 or more instances of the child entity set. Each instance in the child entity set is associated with exactly one instance of the parent entity set.

Classification of relationships:
1. Calibration-type relationships
a. Calibration-type contact example
2. Non-calibration-type relationships
b. Non-standard contact
3. Classification relationships
c. Examples of Classified Links
4. Non-deterministic relationships
d. Non-deterministic connection

IDEF1X and IDEF0 introduced in Chapter 2 are a series of modeling tools.
 IDEF0 is a functional modeling method.
 IDEF1X is a data modeling method.

5. Conceptual design example (shopping mall operation management system)
modeling objective: support customer management, procurement and inventory management, sales management,
 human resource management, financial management and other business activities.
Define entity sets: customers, membership cards, employees, cash registers, sales documents,   
 suppliers, commodities, purchase and storage documents (corresponding to the 7 rectangular boxes on page 43 of the tutorial)
Define connections (difficulties): Definition based on semantic constraints.
Establish an information model (see Figure 3.8, 3.9 of Tutorial P43-44 for details)
Confirm entity attributes
Integrate and optimize the information model

Test point 2: Database logic design (to be perfected)

Basis for logical design:
information model and database conceptual design specification, which are also the basis for database users to confirm data requirements.
The task of logical design:
  convert conceptual model (such as ER diagram) into data model supported by DBMS (such as relational model), and optimize it.

The basis and stage goal of logical design:
Basis and Phase Objectives of Database Logic Design
Supplement related concepts
Relational model
  There are three main data models: hierarchical model, network model, and relational model. Among them, the relational model is simple and flexible, and has a solid theoretical foundation, and has become the most popular data model at present.
  The relational model is a model that uses a two-dimensional table structure to represent entities and the relationships between entities.
  The description of the relationship is called the Relation Schema. The relationship model consists of five parts, that is, it is a five-tuple: R(U, D, DOM, F)
  R: the relationship name U: the set of attribute names that make up the relationship D: the domain DOM from which the attributes in the attribute group U come from : mapping from attribute to domain F: a set of data dependencies on attribute group U
  Since D and DOM have little relationship to schema design, here the relational schema is simplified as a triple:
  R<U, F>, if and only When a relation R on U satisfies F, R is called a relation of the relation schema R<U,F>.
insert image description here

1. The core of relational database design: the design of relational schema.
2. The design goal of the relational schema: According to certain principles, construct a set of relational schemas that can better reflect the real world and have good operational performance from a large number of interrelated data.
New Orleans method, database design steps:
requirements analysis—>conceptual structure design—>logical structure design—>physical structure design
ER diagram relational schema design

Data Dependency
Definition: 
  Let R(U) be a relational schema on an attribute set U, and X and Y are subsets of U. If for any possible relation r of R(U), it is impossible for two tuples in r to have equal attribute values ​​on X but unequal attribute values ​​on Y, then it is said that "X function determines Y" or "Y function depends on X", denoted as X→Y.
Data dependency
  A constraint relationship between internal attributes and attributes
  It is an abstraction of the relationship between attributes in the real world The intrinsic
  nature of data The
  embodiment of semantics
The expression of integrity constraints
  Limit the value range of attributes, such as age<60
  defines between attributes The interrelationship of values ​​(mainly reflected in the equality of values), this is data dependence

The type of data dependence
Functional Dependency (Functional Dependency, FD)
  is ubiquitous in life. This dependency is similar to the function y=f(x) in mathematics. After the independent variable x is determined, the corresponding function value y is also unique. confirmed.
  For example, the relationship: once the ID number of a citizen (ID number, name, address, and work unit)
  is determined, the address is uniquely determined, so the address function depends on the ID number.
  Once the name is determined, the address may not be determined.

Multivalued Dependency (Multivalued Dependency, MD)
  teacher number may be multivalued depending on course number, because given a combination of (course number, reference book number), there may be multiple corresponding teacher numbers. This is because multiple teachers can use the same or different reference books for the same class.
  To put it simply, a function is the only determined relationship; multi-valued dependencies cannot be uniquely determined.

Several special cases of functional dependence
1. Trivial functional dependence and non-trivial functional dependence
  If X→Y, and YX, then X→Y is called non-trivial functional dependence.
  If YX , then X→Y is said to be a trivial functional dependency.
  Because when YX, there must be X→Y, trivial functional dependence must be established, which is meaningless, so the general functional dependence always refers to non-trivial functional dependence.

Example: Sno represents the student's student number, Cno represents the course number, and Grade represents the grade.
  In the relation SC (Sno, Cno, Grade),
  non-trivial functional dependencies: (Sno, Cno) → Grade
  trivial functional dependencies: (Sno, Cno) → Sno
(Sno, Cno) → Cno

2. Complete functional dependence and partial functional dependence
  If X → Y, and for any X' X, there is X'      
  Y, then y is said to be completely dependent on x, denoted as XY.
  If X → Y, but Y does not completely depend on X, then Y is said to be partially functionally dependent on X, denoted as XY.

Example: course selection (student number, course number, course name, grade)
(student number, course number) grade
(student number, course number) course name because course number→course name
inference: if X→Y, and X is a single attribute , then XY

3. Transfer function dependence
  If X→Y, Y→Z, and Y X, Y X, then the transfer function of Z is said to depend on X. Denote X transfer → Z .
Example: student (student number, name, department name, department head)
Obviously, the department head transfer function depends on the student number,
because student number → department name, department name → department head

Thinking question: Knowing the relationship pattern R (student number, course name, student major number, major name, grade), what is the following relationship?

(student number, course name, student major number) grades
student number major name
(student number, major name) grades
(student number, course name) grades
(course name, major name, grades) (course name, grades )

(functional dependency, partial functional dependency)
(functional dependency, transitive functional dependency)
(not functional dependency)
(full functional dependency)
(trivial functional dependency)

Candidate key, primary key, foreign key
  We already know that if the value of an attribute group can uniquely determine the value of the entire tuple, then the attribute group is called a candidate key or candidate key.
  For example: in (student number, name, gender, age), student number is a keyword, (student number, name) is not a keyword, and gender is not a keyword.
  If there are multiple candidate keys, one of them can be selected as the primary key.

Attribute or attribute group X is not a key of relational schema R (neither primary key nor candidate key), but X is a key of another relational schema, then X is said to be an external key of R, also known as a foreign key (Foreign key).
  For example: in SC (Sno, Cno, Grade), Sno is not a code, but Sno is the code of relational schema S (Sno, Sdept, Sage), then Sno is the external code of relational schema SC.

Data normalization
  The design of a relational database is mainly a relational schema design. The quality of relational schema design directly affects the success or failure of database design. Normalizing the relational schema is the only way to design a better relational schema.
  The normalization of relational schema is mainly done by relational paradigm.
  Normalization of relational schema: the process of decomposing a lower-level relational schema into a higher-level relational schema.
  The normalization theory of relational database is a tool for database logic design.
  Purpose: Try to eliminate insertion and deletion exceptions, complex modification, and data redundancy.

Paradigm
  : The constraints that a relational schema satisfies are called normal forms. According to the degree of normalization, the normal forms are divided into 1NF, 2NF, 3NF, BCNF, 4NF, 5NF from low to high.
  1NF: If the relation schema R, all its attributes are indivisible basic data items, then R is said to belong to the first normal form, R∈1NF.
  2NF: If the relational schema R∈1NF, and each non-primary attribute is fully functionally dependent on the primary key, then R is said to belong to the second normal form, R∈2NF.
  Example: Determine whether R (student number, name, age, course name, grades, credits) belongs to the second normal form.

Primary key: (student number, course name)
  non-primary attribute: name, age, grade, and credits
have the following determining relationship: (student number, course name) → (name, age, grade, credits) but (course name) → ( credits)
  (student number) → (name, age)
  3NF: If the relational schema R is 2NF, and each non-primary attribute in R does not transmit the primary key that depends on R, then the relation R is said to belong to the third normal form, R ∈ 3NF.
  Example: Determine whether R (student number, name, age, college, college location, college phone number) belongs to the third normal form.
Primary code: (student number)
  non-primary attributes: name, age, college, college location, college phone
  There is a transfer function dependence of the non-key field "college location" and "college phone" on the key field "student number"

Test point 3: Database physical design
1. Overview of physical design
Through the conceptual design and logical design of the database, the relational schema has been standardized. The purpose of database physical design is to convert the logical description of data into technical specifications, and its goal is to design data storage schemes to provide sufficient performance and ensure the integrity, security, and recoverability of database data.

在这个阶段,将根据数据库中存储的数据量、用户对数据库的使用要求和使用方式,选择数据存储方案以加快数据检索速度。

在物理设计时需要了解不同文件组织方式、索引技术及其使用方案。

2. The physical structure of the database

数据库的应用数据是以文件形式存储在外设存储介质(如磁盘)上的,文件在逻辑上被组成记录的序列 -每个DB文件可以看作是逻辑记录的集合。

物理文件可以看作是由存放文件记录的一系列磁盘块组成的,文件的逻辑记录与磁盘间的映射关系是由操作系统或DBMS来管理的。

从数据库物理结构角度需要解决的问题:
文件的组织
文件的结构
文件的存取
索引技术

3. Indexing
(1) Indexing technology
Indexing technology is a fast data access technology, which directly links the value of each record of a file on one or some fields with the physical address of the record, providing a According to the mechanism of quickly accessing file records according to the value of the record field, the key of the index technology is to establish the mapping relationship between the value of the record field and the physical address of the record (this mapping relationship is called index).
(2) Index technology classification
2.1 Ordered index (index file mechanism)
  The index file mechanism uses index files (index record composition) to realize the mapping relationship between the value of the record field (search code, sorting field) and the physical address of the record.
  Data files (index files or main files) and index files (a collection of index records or index items) are two subjects in the ordered index technology, and data files often adopt a sequential file structure.

有序索引作为基于索引文件的索引技术,需要考虑的两个关键问题是:
1.如何组织索引文件中的索引记录;
2.如何从索引文件出发,访问数据文件中的数据记录;

The index file creation file is as follows:
1. First select one or some record fields in the data file as the search code
2. Then establish the mapping relationship between the value of the data record on the search code and the physical address of the record , forming index entries
3. Index files store index records in ascending and descending order according to a specific lookup value, and are also organized as sequential files.

The index is built on the search code. For a data file, you need to query the file records from several aspects. You can define multiple search codes and create a corresponding index file for each search code. A data file can have multiple search codes and Multiple index files.
index
(1) Clustered index (index items are arranged in the same order as data records) and non-clustered indexes (index items are not arranged in the same order as data records). Only one clustered index can be created for a data file, but multiple non-clustered indexes can be created.
(2) Dense index (each search code in the data file corresponds to an index record) and sparse index (the value of a part of the search code corresponds to an index record).
(3) Primary index (the index established on the primary code attribute set) and auxiliary index (the index established on the non-primary attribute).
(4) Unique index (the index column does not contain duplicate values)
(5) Single-layer index (linear index, each index item is arranged in sequence and directly points to the data records in the data file) and multi-layer index (used in large data files) Multi-layer tree (B, B+ tree) index fast positioning).

2.2 Hash index (hash index mechanism)
  The hash (Hash) index mechanism uses the hash function to realize the direct mapping relationship between the value of the record field and the physical address of the record.

3. The physical design content of the database
  The physical structure design of the database is to design the appropriate physical structure of the database according to the logical design results of the database under the constraints of the specific hardware environment, operating system and DBMS. The goal is to obtain a database physical model with less storage space, high data access efficiency and low maintenance cost.

Database physical design mainly includes 5 links.
(1) Database logical schema description
The database logical design produces the logical structure of the database, including the relational schema of the database, integrity constraints on the relational schema, and business rules for specific applications. These contents are independent of the specific target DBMS platform adopted by DBAS.

The physical design of the database needs to design the pattern information of the relational table (here called the basic table) supported by the target DBMS according to the logical structure information of the database. These pattern information represent the structure of the specific target database to be developed.
  
The main design content of the logical mode description:
describe the basic tables and views for the target database.
Use the table creation syntax supported by the target DBMS to describe the basic tables and the integrity constraints that meet the application requirements.
Design business rules for basic tables
Utilize the integrity control mechanism provided by the target DBMS to design application-oriented integrity constraints that basic tables should abide by.

SQL Server uses the T-SQL language.
  Choose the appropriate file structure (heap, sequential, aggregate, index, and hash) for the base table.
(2) File organization and access design
  Basic principles
  According to the application, the variable part and the stable part, the part with high access frequency and the part with low access frequency are stored separately to improve system performance.
  
Analyze and understand database transaction access characteristics:
use the transaction-basic table cross-reference matrix
to estimate the execution frequency of each transaction,
summarize the transaction operation frequency information of each basic table

Consider placing tables and indexes on separate disks. When querying, since the two disk drives are working separately, the physical read and write speed can be guaranteed to be relatively fast.

Database file structure
Base tables can be indexed on some attributes

(3) Data distribution design
Physical distribution of different types of data
  Reasonably arrange application data (basic tables), indexes, logs, database backup data, etc. in different media.
insert image description here

Division and distribution of application data
 According to the usage characteristics of data (frequently used partitions and infrequently used partitions)
 according to time and location (the same time or location belongs to the same partition)
 data division in the distributed database system (DDBS) (horizontal Partition or vertical partition)
 derived attribute data distribution (increase derived columns or not define derived attributes)
 denormalization of relational schema (reduce normalization and improve query efficiency)
insert image description here

Divide the basic table into multiple sub-tables with the same attributes and identical structures. The tuples contained in the sub-tables are a subset of the tuples in the basic table.
  For example, dividing commodities according to their production years is a horizontal division.

Vertical partition
  Divide the basic table into multiple sub-tables, and the attributes contained in each sub-table are a subset of the original basic table.
  For example, the commodity table (commodity number, product name, unit price, inventory, sales unit price, remarks)
  can be divided vertically into two sub-tables:
  commodity table (commodity number, product name, sales unit price)
  commodity table (commodity number, unit price, inventory, remarks )
insert image description here
insert image description here
insert image description here

(4) Determine the system configuration
DBMS products generally provide some storage allocation parameters
  Number of users using the database at the same time
  Number of database objects opened at the same time
  Length and number of buffers used
  Time slice size
  Database size
  Filling factor
  Number of locks...

These parameter values ​​need to be determined according to the application environment.
  The system has given reasonable default values ​​for these variables.
  But not necessarily suitable for every application environment.
  Determine these parameter values ​​on a case-by-case basis to optimize system performance.

(5) Physical model evaluation
  Evaluate the physical design results of the database from the aspects of access time, storage space, maintenance cost, etc., focusing on time and space efficiency.
  If the evaluation result meets the original design requirements, it can enter the physical implementation stage; otherwise, it needs to redesign or modify the physical structure, and sometimes even return to the logical design stage to modify the data model.
insert image description here

Guess you like

Origin blog.csdn.net/weixin_47288291/article/details/123519370