System Architect's Notes No. 16: Basic Database Concepts

The development of database technology

Database technology has undergone significant development and evolution over the past few decades.

Hierarchical databases and network databases: In the 1960s and early 1970s, hierarchical databases and network databases were the mainstream database models. Hierarchical databases use a tree-like structure to organize data, while mesh databases use complex network structures. These database models are suitable for specific data organization and query needs, but lack flexibility and ease of use.

Relational database: In the mid-1970s, the emergence of the relational database model led to a revolution in database technology. Relational databases use the structure of tables, rows, and columns, and SQL (Structured Query Language) as a language for querying and manipulating data. The relational database model simplifies data organization and query, provides higher flexibility and scalability, and has become the mainstream of the industry.

Object databases: Object database technology emerged in the late 1980s and early 1990s. Object databases extend the relational database model to allow direct storage and manipulation of complex objects and data structures. Object databases are more suitable for object-oriented applications and complex data models, but due to technical and market constraints, it has not replaced relational databases as the mainstream.

NoSQL database: With the rapid development of the Internet and the increasing demand for large-scale data processing, NoSQL (Not Only SQL) databases emerged in the late 2000s and early 2010s. NoSQL databases mainly focus on high performance, scalability, and flexibility, and relax the constraints on data structures. NoSQL databases include key-value stores, document databases, columnar databases, and graph databases.

New SQL Database: New SQL Database is an improvement and extension of traditional relational databases, designed to provide performance and scalability similar to NoSQL databases, while maintaining the transactional consistency and data integrity of relational databases. The new SQL database attempts to achieve better performance and scalability in the field of relational databases by optimizing storage engines, distributed architectures, and parallel processing technologies.

Distributed database: With the rise of big data and distributed computing, distributed database has become an important field. A distributed database distributes data across multiple nodes for high performance, high availability, and fault tolerance. It uses techniques such as distributed transactions, consensus protocols, and data replication to manage distributed data.

Cloud database: With the popularity of cloud computing, cloud database has become an important database deployment mode. Cloud database provides database services based on the cloud platform, and users can obtain and use database resources on demand without paying attention to infrastructure maintenance and management. Cloud databases also provide features such as high availability, elastic expansion, and data security.

In addition to the above technological developments, database technology also involves the integration and application of data warehousing, data mining, real-time analysis, artificial intelligence and machine learning. Database technology plays a vital role in data management, data analysis, and decision support, and continues to drive data-driven innovation and business development.

basic concept

A database is a system for storing and organizing data. The following are some basic concepts of databases:

  1. Data: Data is information that describes things, entities, or concepts. Data in a database can be in the form of numbers, text, images, audio, etc.
  2. Database Management System (DBMS): A database management system is software used to manage and operate databases. It provides a set of functions and tools that enable users to create, access, update, and manage databases.
  3. Table: A table is the basic organizational unit in a database and is used to store related data. A table consists of rows and columns, where rows represent instances of records or data, and columns represent attributes or fields of the data.
  4. Field: A field is a single data element in a table that represents a specific attribute of the data. Each field has a name and a data type such as integer, string, date, etc.
  5. Record (Record): A record is a row in a table, representing a complete data instance. It consists of a set of field values, each corresponding to the data recorded in that field.
  6. Primary Key: A primary key is a field or combination of fields that uniquely identifies each record in a table. It is used to ensure the uniqueness and identification of data and to establish associations between records in a table.
  7. Foreign Key: A foreign key is a field or combination of fields that forms an association with the primary key of another table. Foreign keys are used to establish relationships and references between tables to achieve data consistency and integrity.
  8. Index (Index): An index is a data structure used to improve the performance of database queries. It stores the value of a specific column in the table and the corresponding row position to speed up data lookup and access.
  9. Query: A query is an instruction written in a specific language (such as Structured Query Language, SQL) to retrieve and manipulate data from a database. Queries can be used to search, filter, sort, and combine data.
  10. View (View): A view is a query result based on one or more tables, presented to the user in the form of a virtual table. Views simplify complex query operations and provide logical access to specific data.

These fundamental concepts form the core components of a database and provide the basis for the organization, storage, and manipulation of data. The design and use of databases involves more concepts and technologies, such as normalization, transaction processing, concurrency control, etc., to meet the needs of data management.

data model

A data model is a conceptual tool for describing data structures, data relationships, and data operations. It defines how data is organized in a computer system, and how it can be manipulated and accessed. The following are common data models:

  1. Hierarchical Model: The Hierarchical Model is one of the early data models that organizes data using a tree structure. Data is connected through parent-child relationships, forming a hierarchy. Each parent node can have multiple child nodes, but each child node can only have one parent node. Hierarchical models are suitable for representing data with clear parent-child relationships, such as organizational structures, file systems, and so on.
  2. Network Model: The network model is also one of the early data models, which uses a complex network structure to organize data. Data is connected by nodes and edges to form complex graph structures. In a mesh model, a node can be connected to multiple other nodes, not limited to parent-child relationships. The mesh model is suitable for representing data with complex connection relations, such as network topology, component relations, etc.
  3. Relational Model: The relational model is one of the most commonly used data models today, and it organizes data using the structure of tables, rows, and columns. Data is stored in the form of relations (tables), each relation consists of multiple attributes (columns), and each relation instance (row) represents a data record. The relational model uses relational algebra and SQL (Structured Query Language) for data query and manipulation. The relational model provides flexibility, simplicity, and standardized data representation, and is suitable for most enterprise applications and database systems.
  4. Object Model: The object model extends the relational model to allow direct storage and manipulation of complex objects and data structures. The object model encapsulates data as objects, and each object contains data attributes and related operation methods. The object model is suitable for object-oriented applications and the storage and query of complex data structures.
  5. Document Model: The document model is a non-relational data model for storing and manipulating semi-structured document data. The document model organizes data into a document format similar to JSON or XML, which can flexibly represent complex data structures. The document model is suitable for scenarios such as web applications and content management systems.
  6. Graph Model: A graph model is a data model for representing and processing graph data. A graph model uses nodes and edges to describe entities and the relationships between entities. The graph model is suitable for scenarios such as network analysis, social network, and recommendation system.

These data models have their own characteristics and are suitable for different application scenarios and data requirements. According to specific application and system requirements, choosing an appropriate data model can better organize and manage data.

database management system

A database management system (DBMS) is a software system used to manage and operate databases. It provides a set of functions and tools that enable users to create, access, update, and manage databases.

A database management system has the following main functions:

  1. Data Definition Language (Data Definition Language, DDL): DDL is used to define and manage the structure and schema of the database. It includes operations such as creating tables, defining fields, setting constraints, and building indexes. DDL statements are used to create and modify metadata of database objects such as tables, views, indexes, etc.
  2. Data Manipulation Language (Data Manipulation Language, DML): DML is used to query, insert, update and delete data in the database. The commonly used DML language is SQL (Structured Query Language), which provides rich syntax and operators for data manipulation and query on the database.
  3. Data Query Language (Data Query Language, DQL): DQL is a subset of DML, specifically for querying data in the database. It allows users to retrieve data through SQL statements, and perform operations such as sorting, filtering, and aggregation on data.
  4. Data integrity and constraints: DBMS supports the definition of data integrity constraints in the database to ensure data consistency and validity. For example, primary key constraints, unique constraints, foreign key constraints, and other rules and relationships used to restrict data.
  5. Transaction management: DBMS supports transaction management and control to ensure data consistency and reliability. A transaction is a logical unit of a set of operations that either executes successfully or rolls back. DBMS provides ACID (Atomicity, Consistency, Isolation, and Durability) properties to ensure the correct execution of transactions.
  6. Database security and rights management: DBMS provides user and role management functions to control access rights to the database. It allows administrators to assign different levels of permissions to users to protect data security and confidentiality.
  7. Database backup and recovery: DBMS supports database backup and recovery functions to prevent data loss and failure. It provides tools and methods for backing up and restoring databases for data protection and disaster recovery.
  8. Performance optimization and query optimization: DBMS provides performance optimization and query optimization functions to improve database access and operation efficiency. It can speed up queries and improve system performance through technologies such as indexing, query plan optimization, and cache management.

Common database management systems include Oracle Database, MySQL, Microsoft SQL Server, PostgreSQL, MongoDB, etc. Each DBMS has its specific functions and characteristics, and is suitable for different application scenarios and requirements.

Database tertiary schema

The three-level schema of the database refers to the external schema (External Schema), the conceptual schema (Conceptual Schema) and the internal schema (Internal Schema), also known as the three-level abstraction. They represent different views and descriptions of the database at different levels.

  1. External schema (External Schema): The external schema is the user's visible part of the database, which describes the user's view and access to data. Each outer schema defines the subset of data required by the user and related operations to meet the needs of a specific user or application. External schemas allow users to define and manipulate data independently without knowing the overall structure of the database and other users' views. Through the external mode, different users can have different data display and operation methods, providing personalized and customized data access.
  2. Conceptual Schema: The conceptual schema is the global logical structure and overall description of the database. It defines the logical structure, relationships and constraints of all data in the database, independent of specific application and user requirements. The conceptual schema provides an intermediate layer that connects the outer schema with the inner schema. It is the core of database design, including entities, relationships, attributes, relationship constraints, etc. The conceptual model enables different users to share the same data structure and consistent data definition, providing data consistency and data independence.
  3. Internal Schema: The internal schema is a description of the physical storage and underlying implementation of the database. It defines the underlying details of how data is organized on storage media, index structures, and data storage formats. An internal schema is usually closely associated with a database management system (DBMS), which describes the physical representation of data at the storage level. The inner schema hides the underlying details and provides an abstract interface for the upper layer, so that the outer schema and conceptual schema can be operated and queried independently of the physical implementation.

The design goal of the three-level pattern is to achieve data independence and modularity. The external mode enables different users to access the database according to their own needs, without being affected by other users and applications; the conceptual mode provides a unified data model and consistent data definition, so that data can be shared between different external modes and interaction; the internal mode hides the underlying implementation details, provides the physical independence of the data, and enables the physical implementation of the database to be adjusted and optimized as needed.

Through the division of the three-level model, the design and management of the database become more flexible, scalable and maintainable. There are mapping and conversion relationships between different levels of schema, so that the database system can meet the needs of different users at the same time, and effectively manage and optimize the underlying implementation.

Guess you like

Origin blog.csdn.net/u010986241/article/details/131244512