The pain point of data platform construction, how to carry out metadata governance?

 1. What is metadata

1.1. Introduction to Metadata

 Metadata ( Metadata ), also known as intermediary data and relay data , is the data describing data (data about data), mainly describing the information of data attributes (property) , used to support such as indicating storage location, historical data, resource search , file recording and other functions are called data data.

1.2. Metadata classification

metadata content content source Support asset management
technical metadata surface mysql,ES,HIVE,clickhouse等 asset map
Operation ETL,DATAX,SQL,QUERY
production metadata Production Scheduling system/Yarn Data Quality, Cost Governance


business metadata
Data Warehouse Grading Modeling specification


Asset value, security governance, standardized governance
data classification business
Index Correlation Indicator system
application information BI Kanban, data report
Privacy rating business
derived metadata storage metering ClickHouse,ES,HDFS,MQ Cost Governance, Asset Value
access metering SQL-log
Lineage metadata blood relationship Flink,DATAX,ETL, Asset Mapping, Impact Analysis
field lineage SQL-Log,HOOK

2. Why metadata governance is needed

Data specification formulation is more standardized, data quality is improved, data directory structure is clearer, data assets are clearer, and data cost is more controllable!

The core of data management is the construction of metadata platform, which supports the upper layer application of data management with metadata

3. Current status of metadata governance construction

  1. Little or no metadata information
  2. Without standard data access specifications and data development guidelines, it is difficult to open and control data opening permissions
  3. Poor data quality, data anomalies are difficult to monitor
  4. The data assets are not clear. When you want the data, you don’t know whether the data is available or not.
  5. Data cost estimation is difficult

Capabilities that should be possessed in metadata platform construction

4. Metadata application

4.1. Data Map: Metadata Search and Discovery

  • Support table, field, description information, data warehouse layering, data classification, label, department and other information search
  • Global Metadata Search
  • Supports the search of information such as indicator dimension boards

Solve the problem: data asset management is chaotic, data classification is not clear, and it is not clear which data assets exist.


Technical solution: no difficulty

4.2. Data Lineage

  • Data life cycle view
  • Data link abnormal alarm
  • Data Change Notification

Problem solving: detection of upstream and downstream data changes, data lifecycle control, and full-link anomaly detection.

Technical solution: Altas, SQL parser. Flink, Hook function

4.3. Cost Monitoring and Governance 

Data Storage Cost Dashboard

  • Different storage engine data storage data magnitude
  • Use virtual machine resource consumption, such as CPU, network bandwidth, hard disk, etc.
  • Resource usage trend, cost budget

Solve the problem: solve the problems of unclear data assets and uncontrollable storage costs, so that the data assets are clear and the bottom number is clear

Technical solution: Data burying point, docking operation and maintenance system resource situation reporting.

4.4. Data quality diagnosis

Problem solving: data cost control,

 4.5. Data Storage Cost Dashboard

Guess you like

Origin blog.csdn.net/b379685397/article/details/127093533
Recommended