Database system: 1. Introduction

Better reading experience\huge{\color{red}{better reading experience}}better reading experience


1.1 Overview of the database system


1.1.1 Basic concepts


data

  • Data is the basic object stored in the database.
  • The symbolic records that describe things are called data.
  • Data comes in many forms, all of which can be digitized and stored in a computer.

The form of expression of the data cannot fully express its content and needs to be explained. The data and the explanation about the data are inseparable.

The interpretation of data refers to the description of the meaning of the data. The meaning of the data is called the semantics of the data , and the data and its semantics are inseparable.


Database (DataBase, DB)

  • A database is a collection of organized and shareable large amounts of data stored in a computer for a long time.

  • The data in the database is organized, described and stored according to a certain data model, which has the following characteristics:

    • Less redundancy (redundancy)
    • Higher data independence (data independence)
    • Scalability
    • Can be shared by various users

    In a nutshell, database data has three basic characteristics: permanent storage, organization and sharing.


Database Management System (DataBase Management System, DBMS)

A database management system is a layer of data management software that sits between the user and the operating system . Like the operating system, the database management system is the basic software of the computer , and it is also a large and complex software system. Its main functions include the following aspects:

  • Data definition functions:
    • Provide Data Definition Language (Data Definition Language, DDL) to define the composition and structure of data objects in the database.
  • Data organization, storage and management:
    • The database management system needs to classify, organize, store and manage various data;
    • Improve storage space utilization and facilitate access, provide a variety of access methods to improve access efficiency.
  • Data manipulation functions:
    • Provide Data Manipulation Language (Data Manipulation Language, DML) to manipulate data, and realize addition, deletion, modification and query of database.
  • Database transaction management and operation management:
    • The database is uniformly managed and controlled by the database management system during establishment, operation and maintenance;
    • Ensure the correct operation of transactions, ensure data security, integrity, concurrent use of data by multiple users, and system recovery after failure.
  • Database establishment and maintenance functions:
    • Including the input and conversion function of the initial data of the database, the function of dumping and restoring the database, the function of reorganizing the database and the function of performance monitoring and analysis, etc.
  • Other functions:
    • Including the communication function between the database management system and other software systems in the network;
    • Data conversion functions between one database management system and another database management system or file system;
    • Mutual access and interoperability between heterogeneous databases, etc.

Database system (DataBase System, DBS)

  • A system for storing, managing, processing, and maintaining data consisting of databases, database management systems (and their application development tools), applications, and database administrators (DataBase Administrator, DBA).
  • It is far from enough to rely on a database management system to establish, use, and maintain a database, and it must be done by specialized personnel, who are called database administrators.

insert image description here


1.1.2 The emergence and development of data management technology


insert image description here


1.1.3 Features of the database system


  • Data structuring:

    • The database system realizes the structure of the overall data, which is the essential difference between the database system and the file system.
    • The so-called "overall" structuring means not only that the data is internally structured, but also that the data as a whole is structured, and that there are connections between the data.
  • High data sharing, low redundancy and easy expansion:

    • Data can be shared and used by multiple users and multiple applications. Data sharing can greatly reduce data redundancy and save storage space.
    • Avoid incompatibility and inconsistency between data.
  • High data independence:

    • Physical independence: The user's application and the physical storage of data in the database are independent of each other.
    • Logical independence: The user's application program and the logical structure of the database are independent of each other.
  • The data is uniformly managed and controlled by the database management system:

    • Guarantee data security (security), data integrity (integrity), support concurrency (concurrency) and database recovery (recovery).

    Summary : A database is an organized, massive, and shared collection of data stored in a computer for a long time. It can be shared by various users with minimal redundancy and high data independence. The database management system performs unified control on the database during the establishment, operation and maintenance of the database to ensure the integrity and security of the data, and performs concurrency control when multiple users use the database at the same time, and restores the database after a failure occurs.


1.2 Data Model


1.2.1 Two data models


Data model (data model) is also a model, which is an abstraction of data characteristics in the real world. Data model is the core and foundation of the database system.

The data model should meet three requirements:

  • Can more realistically simulate the real world.
  • It is easy to understand.
  • Easy to realize on the computer.

conceptual model


The first type of conceptual model (conceptual model), also known as information model, it models data and information according to the user's point of view, using simple symbols to describe information, there are no strict rules, as long as it can clearly reflect the information of the real world On the line, mainly for database design.


logical model


The logical models in the second category mainly include hierarchical model, network model, relational model, object oriented data model and object relational data model. model), semistructured data model (semistructured data model), etc. It models data from the point of view of computer systems, and is mainly used for the realization of database management systems.

The physical model in the second category is the lowest-level abstraction of data, which describes the representation and access methods of data within the system, or the storage methods and access methods on disk or tape, and is oriented to computer systems. The concrete realization of the physical model is the task of the database management system.

insert image description here


1.2.2 Conceptual Model


basic concept


  • Entity: Things that exist objectively and can be distinguished from each other are called entities. It can be a concrete person, thing, thing or an abstract concept.
  • Attribute (Attribute): A certain characteristic of an entity is called an attribute. An entity can be characterized by several attributes.
  • Code (Key): The set of attributes that uniquely identify an entity is called a key.
  • Entity Type: The entity name and its attribute name collection are used to abstract and describe similar entities, which is called entity type.
    • For example: student (student number, name, gender, date of birth, department, enrollment time)
  • Entity Set: A collection of entities of the same type is called an entity set.
    • For example: All students of the 2021 School of Computer Science.
  • Relationship: In the real world, the connection within and between things is reflected in the information world as the connection within the entity (type) and the connection between entities (types). There are many types of relationships between entities, such as one-to-one, one-to-many, and many-to-many.

A Representation Method of Conceptual Model


Entity - Contact Method

The conceptual model is to model the information world, so the conceptual model should be able to express the commonly used concepts in the above information world conveniently and accurately. The most commonly used of these is the Entity- Relationship approach. This method uses E-Ra graph (ER diagram) to describe the conceptual model of the real world, and E-Rthe method is also called E-Ra model.


1.2.3 Components of a data model


In general, a data model is a strictly defined collection of concepts. These concepts accurately describe the static characteristics, dynamic characteristics and integrity constraints of the system. The data model therefore has the following three elements:

  • data structure.
  • data manipulation.
  • Data integrity constraints.

data structure


A data structure describes the constituent objects of a database and the relationships between objects.

There are two types of content described by the data structure:

  • One category is related to the type, content, and nature of objects, such as data items and records in the network model, domains, attributes, and relationships in the relational model;

  • One is the objects related to the relationship between data, such as the set type in the network model.

    The data structure is the most important aspect to characterize the nature of a data model. It is a collection of described object types and a description of the static characteristics of the system .


data manipulation


Data operation refers to the collection of operations allowed to be performed on instances (values) of various objects (types) in the database, including operations and related operation rules.

The database mainly has two types of operations: query and update (including insert, delete, and modify).

The data model must define the exact meaning of these operations, operation symbols, operation rules (such as priority) and the language for implementing operations, which is a description of the dynamic characteristics of the system .


Data Integrity Constraints


Data integrity constraints are a set of integrity rules.

Integrity rules are the constraints and dependency rules of the data and their connections in a given data model, which are used to limit the database status and status changes that conform to the data model, so as to ensure the correctness, validity and compatibility of the data. The rules include :

  • physical integrity.

  • Referential integrity.

  • User-defined integrity.

    Satisfying the above three means satisfying the integrity constraints of the data.


1.2.4 Common data models


  • Hierarchical Model
  • Network Model
  • Relational Model
  • Object Oriented Model
  • Object relational data model
  • Semistructure data model

The hierarchical model and the network model are collectively referred to as the formatted model.


1.2.5 Hierarchical model


The hierarchical model is the earliest data model in the database system, and the hierarchical database system adopts the hierarchical model as the data organization method.


Data Structures for Hierarchical Models


Define the set of basic hierarchical relations that meet the following two conditions in the database as a hierarchical model:

  • There is only one node without a parent node, and this node is called the root node;
  • Nodes other than the root have one and only one parent node.

In the hierarchical model, **Each node represents a record type, and the connection between record types is represented by the connection (directed edge) between the nodes. This connection is a one-to-many connection between parent and child . **This makes the hierarchical database system can only handle one-to-many entity relationships.

In the hierarchical model, the child nodes of the same parent are called sibling nodes (twin or sibling), and the nodes without child nodes are called leaf nodes:

insert image description here

It can be seen from the above figure that the hierarchical model is like an inverted tree, and the parents of the nodes are unique .


Data Manipulation and Integrity Constraints for Hierarchical Models


The data manipulation of the hierarchical model mainly includes query, insert, delete and update. The integrity constraints of the hierarchical model must be satisfied when operating:

  • When performing an insert operation, if there is no corresponding parent node value, its child node value cannot be inserted.
  • When performing a delete operation, if the parent node value is deleted, the corresponding child node value will also be deleted at the same time.
  • When performing an update operation, if the updated node has child nodes, all corresponding records should be updated to ensure data consistency.

Advantages and disadvantages of hierarchical model


  • advantage:
    1. The data structure of the hierarchical data model is simple and clear.
    2. The query efficiency is high, and the performance is better than the relational model, not lower than the mesh model.
    3. The hierarchical data model provides good integrity support.
  • shortcoming:
    1. Many relationships in the real world are non-hierarchical, such as many-to-many relationships.
    2. If a node has multiple parent nodes, etc., it is inconvenient to represent it with a hierarchical model.
    3. There are more restrictions on insert and delete operations.
    4. Query child nodes must go through the parent node.
    5. Due to the tight structure, hierarchical commands tend to be procedural.

1.2.6 Mesh Model


In the real world, the connection between things is more non-hierarchical. It is not direct to use a hierarchical model to represent a non-tree structure, but a network model can overcome this disadvantage.


Data Structures for Mesh Models


In the database, the basic hierarchical relationship set that satisfies the following two conditions is called a network model:

  • Allows more than one node to have no parent.

  • A node can have more than one parent.

    The network model is a more general structure than the hierarchical model. **It removes the two restrictions of the hierarchical model, allowing multiple nodes to have no parent nodes, and allowing nodes to have multiple parent nodes; in addition, it also allows multiple connections between two nodes (called composite link). **Thus, mesh models can more directly describe the real world. The hierarchical model is actually a special case of the network model.

insert image description here

It can be seen from the figure above that the connection between the child node and the parent node in the hierarchical model is unique, but in the network model this connection may not be unique.


Data Manipulation and Integrity Constraints for Network Models


The data manipulation of the network model is the same as that of the hierarchical model.

Since the network model generally does not have such strict integrity constraints as the hierarchical model, the specific network database system imposes some restrictions on data manipulation and provides certain integrity constraints.


Advantages and disadvantages of the mesh model


  • advantage:
    1. It can describe the real world more directly, such as a node can have multiple parents.
    2. It has good performance and high access efficiency.
  • shortcoming:
    1. The structure is relatively complex, and with the expansion of the application environment, the structure of the database becomes more and more complex, which is not conducive to the end user's grasp.
    2. DDL, DMLThe language is complex and not easy for users to use.
    3. Users must understand the details of the system structure, increasing the burden of writing applications.

1.2.7 Relational Model (Key Points)


The relational model is the most important kind of data model. Relational database systems use a relational model as the way data is organized.


Data Structures for Relational Models


  • The relational model is built on a rigorous mathematical foundation
  • From the user's point of view, the logical structure of data in the relational model is a two-dimensional table, which consists of rows and columns.
  • The relationship must be normalized and satisfy certain specification conditions, the most basic specification condition: each component of the relationship must be an inseparable data item.

Borrow the following figure to introduce the basic concept of the relational model:

insert image description here

  • Relation: A relation corresponds to what is usually called a table.

    • For example: the student registration form in the picture.
  • Tuple: A row in a table is a tuple.

  • Attribute (attribute): A column in the table is an attribute, and a name is given to each attribute, which is the attribute name.

    • For example: the table shown in the figure has 6columns corresponding to 6attributes (student number, name, age, gender, department name and grade).
  • Code (key): also known as code key. An attribute group in a table that uniquely identifies a tuple.

    • For example: the student number in the figure can uniquely identify a student, which becomes the code of this relationship.
  • Domain: A domain is a collection of values ​​of the same data type. The range of values ​​for an attribute comes from a domain.

    • For example: if the age of a person is generally between 1~120years, the domain of the age attribute of college students is ( 15~45years), the domain of gender is (male, female), and the domain of department name is a set of all department names of a school.
  • Components: An attribute value in a tuple.

  • Relational schema: the description of the relationship, generally expressed as a relationship name (attribute 1, attribute 2, ..., attribute n).

    • For example: the above relationship can be described as a student (student number, name, age, gender, department name, grade).

    The relationship model requires that the relationship must be standardized , that is, the relationship must meet certain normative conditions. The most basic of these normative conditions is that each component of the relationship must be an inseparable data item, and tables are not allowed. .


Data Manipulation and Integrity Constraints of Relational Model


Data operations include query, insert, delete, update. The integrity constraints of the relational model must be met when operating:

  • Data operation is a collection operation, and both the operation object and the operation result are relations, that is, a collection of several tuples.
  • The access path is hidden from the user, and the user only needs to point out "what to do" or "what to look for", without specifying "how to do it" or "how to find it".
  • Ensure that operations follow entity integrity, referential integrity, and user-defined integrity principles.

Advantages and disadvantages of the relational model


  • advantage:
    1. Built on a foundation of rigorous mathematical concepts.
    2. The concept is single, the data structure is simple and clear, and the user is easy to understand and use. Entities and various connections are represented by relationships, and the results of data retrieval and update are also relationships.
    3. The access path of the relational model is transparent to users, has higher data independence, better security and confidentiality, and simplifies the work of programmers and database development and establishment.
  • shortcoming:
    1. The access path is transparent to the user, so the query efficiency is often not as good as that of the non-relational data model.
    2. In order to improve the performance, the user's query request must be optimized, which increases the difficulty of developing the database management system.

1.3 Structure of the database system


  • From the perspective of the database management system, it is a three-level schema structure.
  • From the perspective of database end users, single-user structure, master-slave structure, distributed structure, client/server, browser/application server/database server.

1.3.1 The concept of database system schema


  • Type: A description of a certain type of data structure and attributes.
  • Value: A concrete assignment of a type.
  • Schema: The description of the logical structure and characteristics of all data in the database, only involving the type description, reflecting the structure of the data and its relationship, the schema is relatively stable .
  • Instance: A specific value of the pattern, which reflects the state of the database at a certain moment. There can be many instances of the same pattern, and the instance changes with the update of the data in the database .

1.3.2 The three-level schema structure of the database system


The three-level schema structure of the database system means that the database system is composed of three levels: external schema, schema and internal schema:

insert image description here


schema


Patterns are also called logical patterns, and their characteristics are as follows:

  • According to the description of the logical structure and characteristics of all data in the database, the public data view of all users, which synthesizes the needs of all users, is the middle layer (status) of the schema structure of the database system , and a database has only one schema.
  • A schema is a view of database data at a logical level, based on a data model.
  • Content: the logical structure of the data (such as the name, type, and value range of the data item), the relationship between the data; the security and integrity requirements related to the data.
  • A database management system provides a schema data definition language (schema DDL) to strictly define schemas.

external schema


External schemas are also called subschema or user schemas:

  • A description of the logical structure and characteristics of local data used by database users (including application programmers and end users), and a database user's data view is a logical representation of data related to an application, between schema and application ( status ).
  • The relationship between schemas and external schemas: one-to-many, external schemas are usually a subset of schemas, and a database can have multiple external schemas. It reflects different users' application requirements, ways of viewing data, and requirements for data confidentiality. For the same data in the schema, the structure, type, length, and confidentiality level in the external schema can be different.
  • The relationship between external mode and application: one-to-many, the same external mode can also be used by multiple application systems of a user, but an application can only use one external mode.

internal schema


Internal mode is also called storage mode (storage schema):

  • It is the description of the physical structure and storage method of data, and the organization method of data in the database.
  • Contents: storage method of records (sequential storage, storage according to B-tree structure, storage according to hash method), organization method of index, whether data is compressed and stored, whether data is encrypted, regulations of data storage record structure.
  • A database has only one internal schema.

1.3.3 Secondary Image Function and Data Independence of Database


foreign schema/schema image


  • Content: Define the corresponding relationship between external modes and modes, each external mode corresponds to an external mode/mode image, and the image definition is usually included in the description of each external mode.

  • use:

    • Guarantee the logical independence of data

    • When the schema changes, the database administrator modifies the relevant external schema/schema image so that the external schema remains unchanged. The application program is written according to the external schema of the data;

    • This ensures that the application program does not need to be modified, and the logical independence of the data and the program is guaranteed, referred to as the logical independence of the data.


schema/intra-mode image


  • Content: The schema/internal schema image defines the correspondence between the global logical structure of data and the storage structure. The schema/intra-schema image is unique in the database, and the image definition is usually included in the schema description.
  • use:
    • Ensure the physical independence of data.
    • When the storage structure of the database changes (for example, another storage structure is selected), the database administrator modifies the schema/internal schema image so that the schema remains unchanged and the application program is not affected.
    • This ensures the physical independence of data and programs, referred to as the physical independence of data.

Guess you like

Origin blog.csdn.net/LYS00Q/article/details/129091894