DBT (Data build tool) Framework Overview

DBT (Data build tool) Description of the case:

1, ELT (data extraction, data loading, data conversion) data processing flow, DBT is mainly used to make the frame T (only recognized official T) and L are mainly relies on the definition of the configuration file, data L and T;
     For example: Configure the paper path configuration data file, data file format (column names), detect the type of data (unique, not_null, accepted_values ​​etc.), data conversion processing templates;
               dbt frame data based on the configuration file to make automated processing of: loading data files into the data warehouse, data specifications of the probe, to make data conversion process, can be used to generate the data set for analysis;
2, there are two types of profile, .yml, .sql, .yml used to define item information, the data link information, data format, data type information detection; .sql transformation logic is used to define a data processing;
3, the entire data processing flow performed by hierarchical division: loading the data layer, data pretreatment layer, layer data marts, profile exists in each layer, where the layer is defined for the configuration;
4, output of the document data processing: the original data loaded into the data warehouse, data detection according .yml defined type, quality data make exploration and production data probe description, Figure 1;
                                          Overall data processing is completed, the data model can be generated reports / map data describing the relationship between the blood and the data model definitions, Figure 2, Figure 3;
DBT advantages:
 
1, the official website as described above, the development of interest to develop logic using only the code, without concern of creating a table and view, without concern code execution order, automatically frame work of DBT;
2, DBT data frame has a defined specification and development process, items can be normative constraints;
3, DBT frame has been abstracted general function, can reduce complexity and repetitive coding, such as: data profiling, the contents of the probe defined in the configuration file, the result of DBT probe automatically output, to avoid the development of the code, data profiling reduced difficulty;
DBT is insufficient:
1, DBT can only read the raw data from a file, such as, CSV files;
2, support for data warehouse too few species, such as: Postgres, Presto, Spark, etc., if other data warehouse, DBT need to develop plug-ins;
3, the data processing logic code, to be defined according to the configuration file .sql format, the actual project, reach hundreds of lines of code data, in the configuration specification .sql file format, code management complexity will increase, complex projects. the number will be much sql profile, bring on the complexity of the code combing and management;
4, the actual project data object fields will be more, .yml configuration file defines the fields, would be more complicated than that;
 
 
 

Guess you like

Origin www.cnblogs.com/rudy123/p/12153992.html