DAMA Data Management Body of Knowledge Guide (5): Data Modeling and Design

1. Data modeling and design context diagram

Data modeling is the process of discovering, analyzing, and determining data requirements , representing and communicating those data requirements in a precise form called a data model.

A data model helps an organization understand its data assets , and the direct outcome of data modeling is not the database, but an understanding of the organization's data .

The most common modes of data modeling: relational mode, multidimensional mode, object-oriented mode, fact mode, time series mode, NoSQL mode

Each model can be divided into three models (not all can be divided into three categories): conceptual model, logical model, physical model .

Each model consists of a series of components such as entities, relationships, facts, keys and properties .

Because there are many key concepts in this part, please refer to the chapter on core concepts of data modeling for details.

2. Business drivers

Common business factors that drive organizations to data modeling and design are as follows:

1) Provide a common vocabulary about data .

2) Obtain and record detailed information on data and systems within the organization .

3) As the main communication tool in the project .

4) Provides a starting point for application customization, integration, and even replacement

Good data modeling reduces support costs and increases the likelihood of reuse for future requirements , thereby reducing the cost of building new applications. Data model is an important form of metadata.

3. Activities

1. Planning data modeling : tasks such as assessing organizational needs, determining modeling standards, and clarifying data model storage

2. Establishing a data model : a process of continuous iteration and continuous optimization until the business requirements are met. Common methods of building data models include forward engineering and reverse engineering:

Forward Engineering : The process of building an application starting from requirements. First use the conceptual model to understand the scope of the requirements and the core terminology. Then build a logical model to describe the business process in detail. Finally, the physical model is realized through specific table-building statements.

Reverse Engineering : The process of documenting an existing database. Physical data modeling is usually the first step to understand the technical design of the existing system; logical data modeling is the second step to document the solution that the existing system meets the business; conceptual data modeling is the third step for Document scope and key terms in existing systems.

*When this part is executed, there are more details, detailed operation guide, you can read DMBOK again during execution

3. Audit data model : continuous improvement, model evaluation and official release.

4. Maintain the data model : The data model needs to be kept up to date. A good practice when maintaining data models is to reverse engineer the latest physical data model and make sure it is consistent with the corresponding logical data model . Many data modeling tools can automatically compare physical and logical models

4. Metrics

You can refer to the following table to formulate a data model scorecard and evaluation indicators suitable for your enterprise.

5. Core Concepts of Data Modeling

1. Entity, relationship, attribute, domain

1.1. Entities

An entity definition is a thing that is different from other things, and it is a carrier for an organization to collect information. An entity can be thought of as the answer to some basic questions - who, what, when, where, why, how, or a combination of these questions.

Entities are called differently in different hierarchical models:

Conceptual model: concept concept/term term

Logical model: entity entity

Physical model: table table

Entity Type - Entity - Relationship Between Entity Instances

1.2. Relationship

Relationships are associations between entities. Relationships capture high-level interactions between conceptual entities, detailed interactions between logical entities, and constraints between physical entities.

Relations have some intrinsic properties like cardinality, arity, etc.:

Cardinality of relationship: one-to-one, one-to-many, many-to-many relationships

The number of relationships: the number of entities involved, unary relationships, binary relationships, ternary relationships, etc.

1.3. Properties

Properties are properties that define, describe, and measure some aspect of an entity.

Identifiers in properties, also known as keys.

According to the structure: single key, composite key (a collection of multiple attributes), composite key (combined key + others), surrogate key (also a single key, a unique identifier for a table, and a technical auto-increment ID)

Divided by function: Candidate key (the minimum attribute set that identifies an entity instance, which may contain one or more attributes), primary key (the candidate key selected as the unique identifier of the entity), super key (any attribute set that uniquely identifies the entity instance) , Alternate key (candidate key not selected as the primary key) - generally the primary key is a surrogate key, and the alternate key is a business key

1.4. Domain

A domain represents all possible values ​​that can be assigned to an attribute, and is also called a value range.

2. Common data modeling methods

2.1. Relationship Modeling

Relational modeling is a systematic method of organizing data that can clearly express meaning, and is effective in reducing data storage redundancy, and is especially suitable for designing operational systems . The most common is the information engineering method, which uses a three-pointed line to represent the cardinality

2.2. Dimensional modeling.

The idea of ​​dimensional modeling is that the way data is organized is to optimize the query and analysis of massive data .

Dimensional modeling mainly includes the following concepts:

    • Fact table: A specific numeric measure. Such as amount, transaction volume. Often occupy most of the space in the database.
    • Dimension table: Represents important objects of the business, mainly including text descriptions. Such as user information, region information.
    • Granularity: the meaning or description of a single row of data in the fact table, such as date, region, user, etc.

2.3. Non-relational database

NoSQL: Not only SQL. It's not about how to query the database, it's about how the data is stored.

There are generally four categories: document databases, key-value databases, columnar databases, and graph databases.

3. Display of different levels of relational model and dimensional model

3.1. Conceptual Model CDM

A collection of related subject areas to describe profile data requirements. The conceptual data model includes only the basic and key business entities in a given domain and function, and also gives a description of the entities and the relationships between entities.

3.2. Logical model LDM

A detailed description of data requirements, usually in the context of supporting a specific usage (such as application requirements). The logical model is not bound by any technology or specific implementation conditions, and the logical data model is usually extended from the conceptual data model.

3.3, physical model PDM

Describes a detailed technical solution, usually based on a logical model, matching a class of system hardware, software, and network tools. Physical models are technology-specific

relational model

Dimensional model

conceptual modeling

logical modeling

physical modeling

*Because the physical data model is constrained by the implementation technology, the retrieval performance is often improved by combining structures (denormalization), such as students and schools here.

4. Standardization

Normalization is the process of using rules to convert complex services into standardized data structures. The basic goal of normalization is to ensure that each attribute appears in only one position, so as to eliminate redundancy or inconsistency caused by redundancy. The levels of paradigm include:

1) First Normal Form (1NF). Ensure that each entity has a valid primary key, each attribute depends on the primary key, and eliminate redundant grouping to ensure atomicity of each attribute (cannot have multiple values). First normal form includes the resolution of many-to-many relationships with additional entities, often called associated entities.

2) Second Normal Form (2NF). Make sure every entity has a minimum primary key and every attribute depends on the full primary key.

3) Third Normal Form (3NF). Make sure that each entity has no hidden primary key, and each property does not depend on any property other than the key value (only on the full primary key).

In practice, it is generally enough to reach the third normal form, 4NF, 5NF rarely appear

DAMA's explanation of the paradigm is a bit official. To help you understand, you can refer to:

6. Key concepts/tools/methods

1. Data modeling and design quality management

1.1. Develop data modeling and design standards

As mentioned earlier, data modeling and database design standards provide guidelines for meeting business data requirements, conforming to enterprise and data architecture standards, and ensuring data quality. Data modeling and database design standards should include the following:

1) List and description of standard data modeling and database design deliverables.

2) A list of standard names, acceptable abbreviations, and abbreviation rules for uncommon words that apply to all data model objects.

3) A list of standard naming formats for all data model objects, including attributes and classifiers.

4) A list and description of the standard methods used to create and maintain these deliverables.

5) List and description of data modeling and database design roles and responsibilities.

6) A list and description of all metadata attributes captured in data modeling and database design, including business and technical metadata. For example, guidelines can set the expectation that a data model captures data lineage for each attribute.

7) Metadata quality expectations and requirements (see Chapter 13).

8) Guidelines on how to use data modeling tools.

9) Guidelines for preparing and leading design reviews.

10) Data Model Versioning Guidelines.

11) A list of things that are prohibited or to be avoided.

1.2. Review data model and database design quality

Assemble a diverse panel of experts with diverse backgrounds, skills, expectations, and opinions to review data models and database designs. When forming an expert review group, it may be necessary to invite experts in relevant fields to participate through specific channels. Participants must be able to discuss different points of view and ultimately reach a group consensus without any personal conflicts, as all participants share the common goal of promoting the most practical, best performing, most usable design.

1.3. Manage data model version and integration

Careful change control is required for data models and other design specifications, as is requirements specifications and other SDLC deliverables. Note that every change to the data model needs to be documented with a timeline. If changes affect the logical data model, such as new or changed business data requirements, then a data analyst or architect is required to review and approve the changes to the model. Every change should be documented, including:

1) Why (Why) the project or situation needs to be changed.

2) Change object (What) and how (How) to change, including which tables are added, which columns are modified or deleted, etc.

3) When the change is approved (When) and when this change is applied to the model (not necessarily implementing the change in the system).

4) Who made the change.

5) In which models are the locations where the changes are made.

2. Industry data model

Industry data models are pre-built data models for entire industries , including healthcare, telecommunications, insurance, banking, manufacturing, and more. These models are usually broad in scope and detailed in content. Data models for some industries contain thousands of entities and attributes. Industry data models can be purchased through vendors, or through industry organizations such as ARTS (retail), SID (communication), or ACORD (insurance) .

Any purchased data model will need to be customized to fit the organization's idiosyncrasies , as it was designed based on the needs of other organizations. The level of customization required depends on how close this data model is to your organization's needs, and how detailed the most important parts are. In some cases, they serve as working references, helping modelers produce more complete models. Sometimes, it can only help data modelers to save the input work of some common elements.

3. Best practices in database design

When designing and building databases, DBAs should keep the following PRISM design principles in mind:

1) Performance and Ease of Use. Maximize the business value of your applications and data by ensuring users have fast and easy access to data.

2) Reusability. It should be ensured that the database structure can be reused by multiple applications where appropriate and can be used for multiple purposes (such as business analysis, quality improvement, strategic planning, customer relationship management and process improvement). Avoid coupling databases, data structures, or data objects into a single application.

3) Integrity. Regardless of context, data should always have valid business meaning and value, and should always reflect the valid state of the business. Enforce data integrity constraints as close to the data as possible, and immediately detect and report violations of data integrity constraints.

4) Security. Truthful and accurate data should always be provided to authorized users in a timely manner and should be used only by authorized users. The privacy requirements of all stakeholders, including customers, business partners and government regulators, must be met. Strengthen data security, just like data integrity checks, perform data security constraint checks to ensure data security as much as possible. If inspections reveal violations of data security constraints, the violations are reported immediately.

5) Maintainability. Ensure that the cost of creating, storing, maintaining, using and disposing of data does not exceed its value to the organization; perform all data work at a cost that generates value; ensure that changes in business processes and new business requirements are responded to as quickly as possible.

Guess you like

Origin blog.csdn.net/weixin_29403917/article/details/131047363