1. What is metadata
1.1. Introduction to Metadata
Metadata ( Metadata ), also known as intermediary data and relay data , is the data describing data (data about data), mainly describing the information of data attributes (property) , used to support such as indicating storage location, historical data, resource search , file recording and other functions are called data data.
1.2. Metadata classification
metadata | content | content source | Support asset management |
---|---|---|---|
technical metadata | surface | mysql,ES,HIVE,clickhouse等 | asset map |
Operation | ETL,DATAX,SQL,QUERY | ||
production metadata | Production | Scheduling system/Yarn | Data Quality, Cost Governance |
business metadata |
Data Warehouse Grading | Modeling specification | Asset value, security governance, standardized governance |
data classification | business | ||
Index Correlation | Indicator system | ||
application information | BI Kanban, data report | ||
Privacy rating | business | ||
derived metadata | storage metering | ClickHouse,ES,HDFS,MQ | Cost Governance, Asset Value |
access metering | SQL-log | ||
Lineage metadata | blood relationship | Flink,DATAX,ETL, | Asset Mapping, Impact Analysis |
field lineage | SQL-Log,HOOK |
2. Why metadata governance is needed
Data specification formulation is more standardized, data quality is improved, data directory structure is clearer, data assets are clearer, and data cost is more controllable!
The core of data management is the construction of metadata platform, which supports the upper layer application of data management with metadata
3. Current status of metadata governance construction
- Little or no metadata information
- Without standard data access specifications and data development guidelines, it is difficult to open and control data opening permissions
- Poor data quality, data anomalies are difficult to monitor
- The data assets are not clear. When you want the data, you don’t know whether the data is available or not.
- Data cost estimation is difficult
Capabilities that should be possessed in metadata platform construction
4. Metadata application
4.1. Data Map: Metadata Search and Discovery
- Support table, field, description information, data warehouse layering, data classification, label, department and other information search
- Global Metadata Search
- Supports the search of information such as indicator dimension boards
Solve the problem: data asset management is chaotic, data classification is not clear, and it is not clear which data assets exist.
Technical solution: no difficulty
4.2. Data Lineage
- Data life cycle view
- Data link abnormal alarm
- Data Change Notification
Problem solving: detection of upstream and downstream data changes, data lifecycle control, and full-link anomaly detection.
Technical solution: Altas, SQL parser. Flink, Hook function
4.3. Cost Monitoring and Governance
Data Storage Cost Dashboard
- Different storage engine data storage data magnitude
- Use virtual machine resource consumption, such as CPU, network bandwidth, hard disk, etc.
- Resource usage trend, cost budget
Solve the problem: solve the problems of unclear data assets and uncontrollable storage costs, so that the data assets are clear and the bottom number is clear
Technical solution: Data burying point, docking operation and maintenance system resource situation reporting.
4.4. Data quality diagnosis
Problem solving: data cost control,