Metadata-driven ETL

2016-07-03   Zhu Jie  

 

Definition of Metadata Metadata is data about data, mainly information describing data properties, used to support functions such as indicating storage location, historical data, resource search, file records, etc. .

 

Definition of ETL

ETL, short for Extract-Transform-Load in English, is used to describe the process of extracting, transforming, and loading data from the source to the destination. The term ETL is more commonly used in data warehouses, but its objects are not limited to data warehouses.

 

ETL is an important part of building a data warehouse. The user extracts the required data from the data source, and after data cleaning, finally loads the data into the data warehouse according to the pre-defined data warehouse model.

 

The data does not meet the requirements of the analysis, so to prepare the data, this process is called ETL.

 

Metadata-driven value

Data assets can be unified to obtain a global view of enterprise data. A good metadata management tool has a global view of where and what data is in the entire system of the enterprise. Without metadata management tools, we can only rely on personnel experience, and no one can tell the source and function of the data.

 

Simplify the etl process, build automatic tools through metadata, and automatically implement the etl process through simple UI operations based on metadata. Simplify the etl code writing process and the etl process can also be reused in large quantities.

 

Difficulties driven by metadata

Metadata management is difficult. Data changes rapidly, and the traditional manual configuration method is difficult to ensure consistency and is a huge workload. Metadata is an enterprise multi-data dictionary. Maintaining a complete metadata is similar to editing a dictionary.

 

 

When it comes to semantic management, different tables and fields with different names may have the same meaning. Fields with the same name may also have different meanings and also involve version changes.

 

 

So this work is a technical + management work. Many companies in the industry are thinking about how to reduce the difficulty of metadata management, so there is a consensus to use machine learning to automatically identify metadata, such as tamr, Huaao data, and so on.

 

 

Also, metadata is the foundation not only for etl, but also for data quality/data governance.

 


 

 

 

 

 
 

WeChat scan and
follow the public account

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326712000&siteId=291194637
ETL