Database must-know series: data model and database design

Author: Zen and the Art of Computer Programming

1. Background introduction

A data model is a blueprint or framework for building a database system. It stipulates database tables, fields and the relationships between them, and provides effective methods for adding, deleting, modifying, and querying data; the role of the data model helps improve database performance, reliability, and efficiency. In enterprise-level applications, data models can help design databases that are easy to manage and use, reduce development difficulty, and improve development efficiency. The data model can also directly affect the efficiency of database maintenance and operation. A better understanding of the data model is needed to design the database well.

What I want to share today is the "Must-Know Database Series: Data Model and Database Design". This is a comprehensive book, including the basic concepts, importance, key features, logical model, physical model, and entities of data models - Contact model, hierarchical model, network model, etc., and each model is explained in detail. In addition, the database design principles commonly used in data model and database design practices will be explained, such as anti-paradigm design, partition design, index design, ERP schema design, etc., and actual cases will be used to show how to build a data model that meets business needs.

This book is suitable as a basic course for enterprise-level applications and small and medium-sized database applications. It is also suitable as a reference book for interviews and learning data structures and algorithms. After reading this book, readers can master the basic knowledge and methodology of data models, understand the impact of different data models on performance, cost, and scalability, as well as the mutual conversion between them. At the same time, readers can also use the cases in the book to practice database design skills and form their own programming habits in their daily work.

2. Core concepts and connections

2.1 Basic concepts of data model

Data model refers to the logical structure used to organize, store and describe data. It refers to what relationships exist between things in a given collection, how these relationships are organized, and how to organize data. Data models are divided into three types:

  • Entity-Relationship Model
  • Object Model
  • Hierarchical Model

Entity-relationship model (ER model): It establishes a data model by connecting real entities in the real world and a set of attributes between entities. It is based on a "who", "what", "where" and "how" perspective. It mainly includes the following concepts:

1. Entity: Represents an object in the real world, such as people, property, matters, etc. Entities usually have an identifier (Primary Key) that uniquely identifies the entity. 2. Attribute: describes an aspect of an entity, such as name, address, age, gender, etc. Each attribute has a name and a value, which can be a single value or a collection of multiple values. 3. Relationship: Various connections between entities, such as couples, teachers and students, friends, idols, etc. A relationship usually has a name, direction, and set of attributes. 4. Constraint: Some restrictions between entities, such as there cannot be two properties with the same name, each person can only have up to three friends, etc.

Object Model: The object model is an abstract, object-oriented modeling method that treats each entity of the data model as an object with properties and methods. The object model can support complex data structures and dynamic queries. It mainly includes the following concepts:

1. Object: An entity with certain functions and characteristics, such as passenger aircraft, tickets, bank accounts, mechanical devices, files, documents, etc. 2. Attribute: The characteristics of the object, such as height, width, color, price, etc. 3. Operation: The ability of an object to perform a certain function, such as flying, opening a door, making a deposit, etc. 4. Constraint: Some restrictions between data, such as requiring that the data is not empty, the range of values ​​must be within a certain range, etc.

Hierarchical Model: The hierarchical model is a tree-like data model that represents the storage structure of data as a hierarchical structure, so that data can be easily queried from top to bottom or bottom to top. The hierarchical model mainly includes the following concepts:

1. Node: A node in the hierarchical structure, corresponding to an entity or object in the real world. 2. Root Node: The top of the tree, representing the center point of the entire data set. 3. Branch Node: All nodes except the root node. 4. Edge: A line connecting two nodes.

2.2 Importance of data model

Data models are as important as mathematical models, programming languages, operating systems, etc. A data model defines the structure and relationships of data, which can guide the design, optimization, and management of applications. The data model is usually designed by the database administrator and ensures consistency through a standardization process. The data model covers topics such as data specification, data structure, data dependencies, data integrity, data redundancy, data access control, data confidentiality, data recovery and data recovery time, etc.

2.3 Key characteristics of data models

There are four key characteristics of the data model:

1. Logic: The data model must be able to accurately express business logic. The data model should be simple enough to make the design, maintenance, and modification of the information system easy. 2. Evolution: The data model should be able to adapt to changes. As the business develops, new data and new data processing methods should be able to be incorporated into the data model. 3. Integrity: The data model should ensure data integrity and prevent data loss or tampering. The data model should be able to identify data redundancy and reduce database size and overhead. 4. Spatiality: The data model should consider the location and capacity of data storage, and the cost in space and time should be minimized.

2.4 Logic model

A logical model is a form of data model that describes the logical structure and behavior of data in a computer. Logical models help understand business requirements and the way users manipulate data. The logic model includes the following topics:

1. Entity-relationship model (ER model): A form of logical model that uses the entity-relationship model and uses a two-dimensional table to display entities and their relationships. The ER model helps to establish a conceptual model and match it with the database implementation form. 2. Normal Forms: A paradigm is a structural design rule used to standardize relational models. Normal forms help simplify data queries, updates, and insertions and improve database efficiency. 3. View: A view is a fictitious table that is a combination of one or more relational tables. Views can hide irrelevant details and access data on demand. 4. Procedure: A procedure is a reusable code block used to save, manage and schedule procedures in the database. Processes help organize data and enable automation, standardization, and encapsulation.

2.5 Physical model

The physical model refers to the physical storage form of data in the database. The physical model helps select the database's storage engine, disk distribution, indexing strategy, buffer pool size, etc. The physical model covers the following topics:

1. Relational database: Relational database is the most commonly used form of data model. It uses relational model and table form to store data. Relational databases have excellent performance and allow complex queries to be performed on data. 2. Key-value storage: Key-value storage uses an unordered collection of key-value pairs to store data, where the value can be any type of data. Key-value stores help find data quickly and handle large amounts of data. 3. Document storage: Document storage stores data in the form of documents, such as JSON, XML, YAML, etc. Document storage can use flexible data models and support dynamic queries. 4. Column storage database: Column storage database stores data in the form of columns, where each column has its own unique data structure. Column-stored databases help process massive amounts of data and have high OLTP requirements.

2.6 ERP model

The ERP model (Enterprise Resource Planning) is an enterprise resource planning model that focuses on the analysis, planning and decision-making of an enterprise's production and operation processes. The ERP model helps improve the competitiveness and efficiency of enterprises and promotes continuous investment in production. The ERP model contains the following topics:

1. Financial Management: The financial management module in the ERP model includes financial markets, cash flow management, financial reporting, budget management and profit distribution. 2. Procurement Management: The procurement management module in the ERP model helps enterprises formulate reasonable procurement strategies, prepare supplier quotations, and track procurement progress and payment collection. 3. Inventory Management: The inventory management module in the ERP model helps companies manage inventory, handle expired inventory and retail returns, and meet customer requirements. 4. Human Resources Management: The human resources management module in the ERP model helps enterprises realize human resource optimization, salary incentives and incentive management.

2.7 Conceptual model

A conceptual model is an abstraction of entities, relationships, and rules in the real world. Conceptual models can be used in software engineering design, system analysis, database design, data modeling and other fields. The conceptual model contains the following topics:

1. Entity: The entity in the conceptual model is an abstraction in the real world. It can represent a type of thing or object, such as personnel, goods, departments, etc. 2. Attributes: Characteristics possessed by an entity, such as name, contact information, email address, etc. 3. Relationship: various connections between entities, such as husband and wife, teachers and students, father and son, friends, etc. 4. Rules: restrictions and constraints that may exist between entities.

2.8 ERWin model

The ERWin model is an entity-relationship window model, which adds the concept of a form to the entity-relationship model to represent business processes and business rules. The ERWin model can be used as an alternative to business process diagrams to compile and collaborate on the design of data models. The ERWin model contains the following topics:

1. Form: The form in the ERWin model is a virtual component used to program the interaction between activities, events and entities. Forms can help people understand business processes and help the system enforce business rules. 2. Entity Set: Entity group is used to compile a collection of entities. Entity groups can be used to specify the properties and constraints of specific entities. 3. Association Set: Association Set is a collection of relationships used to compile relationships between entities. The relationship group can specify the name of the relationship and the names of entities at both ends. 4. Connector: Connector is used to connect forms, entity groups and relationship groups. Connectors can specify the connection relationships between forms, entity groups, and relationship groups.

2.9 Object-oriented model

Object-Oriented Modeling is a process of mapping data models to object-oriented programming. The object-oriented model makes data an object that can be manipulated and processed. The object-oriented model covers the following topics:

1. Class: A class in the object-oriented model is a collection of the same attributes and operations. Instances of the class can create, delete, modify, and query data. 2. Object: an instantiated class. 3. Attribute: The state variable of a class can be used as a member variable of an instance of the class. 4. Operation: The behavior function of a class is used to change the state of instances of the class. 5. Inheritance: A derived class can obtain attributes and operations from another base class. 6. Polymorphism: An object can invoke different operations based on its actual type.

2.10 Distributed data model

Distributed Data Model is a model that extends the data model to a distributed computing environment. The distributed data model covers the following topics:

1. Replication Mechanisms: Replication mechanisms are used to implement distributed databases to achieve high availability and scalability. Replication mechanisms can help avoid single points of failure and improve performance and reliability. 2. Data Partitioning: Data partitioning refers to dividing data into multiple subsets and distributing them to different machines. Data partitioning can speed up queries and transactions, and improve availability and scalability. 3. Transaction Processing: Transaction processing is used to ensure the consistency and integrity of data. Transactions can propagate data between different nodes. 4. Fault Tolerance: The fault tolerance mechanism can help detect and recover failed nodes. Fault tolerance helps improve availability and reliability.

Guess you like

Origin blog.csdn.net/universsky2015/article/details/133594361