【Data Management】What is data management?

foreword

Data management refers to the management of data resources. According to the definition of DAMA (International Data Management Association): "Data resource management, dedicated to the development of appropriate structures, policies, practices and procedures to deal with the enterprise data life cycle." This is a high-level and comprehensive definition, and not necessarily Concrete operations directly related to data management (such as technical level management of relational databases).

common content

Data management most commonly includes the following:

  • data analysis
  • data modeling
  • database management
  • database
  • data mining
  • Data Security
  • data integration
  • data movement
  • Data Quality Assurance
  • Metadata management (data repositories and their management)
  • Strategic Data Architecture

subject area

According to the division of DAMA DMBOK [DAMA International Guide to Data Management Body of Knowledge (DAMA DMBOK ® )], the field of data management includes the following parts:

  • Data Governance: Data Assets, Data Governance
  • Data architecture, data (model) analysis and design: data architecture, data analysis, data modeling
  • Database Administration: Data Maintenance, Database Administration, Database Management Systems
  • Data security management: data access management, data erasure management, data privacy, data security
  • Data Quality Management: Data Clarity, Data Integrity, Data Enrichment, Data Quality, Data Quality Assurance
  • Reference and master data management: data integration, master data management, reference data
  • Data Warehousing and Business Intelligence Management: Business Intelligence, Data Marts, Data Mining, Data Movement (Extraction, Transformation and Loading), Data Warehousing
  • Documents, Records and Content Management: Document Management System (DMS), Records Management
  • Metadata management: metadata management, metadata discovery, metadata publishing, metadata registration
  • Contact Data Management: Business Continuity Planning, Marketing Operations, Customer Data Integration, Identity Management, Identity Theft, Data Theft, ERP Software, CRM Software, Address (Geography), Zip Code, Email Address, Phone Number

type of data

We can classify the data into the following types based on the description level, business flow, usage, etc. of the data:

  • Metadata
  • Reference Data
  • Master Data
  • Transactional Data

metadata

Metadata (Meta Data) is the data used to describe data (Data that describes other data), or structured data used to provide information about certain resources.

Metadata is data that describes objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; realize effective discovery of information resources, Finding, integrated organization and efficient management of resource usage.

Common metadata such as:

  • Book Cataloging Information
  • EXIF information of the photo
  • Registration Information Form
  • Douban movie information

reference data

Reference data, also known as reference data, is used to make some general definitions and describe the scope and meaning of data. It marks the possible value range of metadata. The data dictionary we refer to when designing a table is often the reference data. For example, gender can only be male and female, and male and female are reference data; country reference data are more than 100 countries and regions in the world.

Common reference data such as:

  • gender male, female, other
  • Order Status
  • Product size, color, operating system
  • The publishing status of the video

main data

Master Data refers to business entities, such as users, products, orders, shopping carts, articles, videos, etc. Master data is used for information exchange across departments and systems.

The goal of master data is to model business entities, or what attributes and behaviors business entities contain, and to ensure the consistency of business entity data in different systems.

Common master data such as:

  • Product information and user information in e-commerce
  • news from news site
  • Videos and podcasters on video sites
  • Merchant in B2B
  • Shops in the food delivery platform

transaction data

Transactional Data refers to data generated by activities between master data. For example, the transaction record of a customer buying a product is transaction activity data, and the user's attention to and rewards for a broadcaster are also transaction data.

Common transaction data such as:

  • Orders and payments generated by e-commerce orders
  • Users on the live broadcast platform like the anchor and brush gifts
  • User attention behavior in social networking sites
  • Chat information, public information posted by IM tool users
  • Relationship and Features

Characteristics of Data Types

Evaluate through the following dimensions:

  • Data volume, update frequency: reference data < metadata < master data < transaction data
  • Life cycle, data quality: reference data > metadata > master data > transaction data

Relationships Between Data Types

Typical cases:

  • Metadata, master data, and transaction data all use reference data
  • Master data will contain metadata
  • Transactional data is the behavior between master data

GIG

Garbage in, garbage out (English: Garbage in, garbage out, abbreviation: GIGO), or translated as waste in, waste out, wiki, is an idiom in the field of computer science and information communication technology, which shows that if the wrong, When meaningless data is input into a computer system, the computer will naturally output wrong and meaningless results. The same principle is manifested in other fields outside of computing.

In statistics, if the raw data analyzed are wrong and inaccurate, then the statistical conclusions will not be credible.

Data Quality Assessment

To avoid problems such as GIGO, evaluate data quality from 4 angles:

  • Integrity: mainly includes four aspects: entity missing, attribute missing, record missing and field value missing;
  • Accuracy: The degree to which a data value agrees with what is assumed to be accurate, or the degree to which it differs from acceptability;
  • Rationality: mainly including the reasonableness and validity of format, type, value range and business rules;
  • Consistency: Consistency of data differences and contradictions between systems, uniform definition of business indicators, consistency of data logic processing results;
  • Timeliness: data warehouse ETL, timeliness and rapidity of application display, time-consuming, running quality, and timeliness of dependent running of Jobs.

Data Quality Management

The data quality function module design mainly includes monitoring object management, inspection index management, data quality process monitoring, problem tracking management, recommendation optimization management, knowledge base management and system management, etc. The process monitoring includes offline data monitoring and real-time data monitoring; problem tracking and processing form a closed-loop process by problem discovery (supporting automatic checking and manual entry), problem reporting, task push, fault grading, fault handling, and knowledge base precipitation.

data governance

DAMA defines data governance as: Data governance is the collection of activities (planning, monitoring, and execution) that exercise power and control over data asset management. The data governance function guides how other data management functions are performed. This definition seems a bit vague. The data governance I understand is actually priority management + process management. Priority management means that we need to prioritize the various issues of data management. Process management is people, roles and responsibilities, that is, who is what role and is responsible for what issue. For example, when there is a problem of missing data, how is this problem prioritized and who will solve this problem.

Data Security

We can often see news of user data leakage of a certain site on the Internet. What's more, the connection information of the database was directly saved on github, causing the database to be copied. These are all caused by the failure of data security work. I personally think that data security starts from the technical and institutional aspects. In terms of technology, it is necessary to ensure data security during data storage, transmission, application, and backup to prevent data leakage. In terms of system, it is necessary to establish a sound data access control and authority management mechanism.

Guess you like

Origin blog.csdn.net/u011397981/article/details/132310605