Reporting Automation: open the door to data warehouse

Above the "Report Automation: Business Intelligence behind the secret" mainly about how to do it step by step I think of business intelligence, reporting through automation, data charts, data visualization, data mining four-step way to go, and gradually let the value of the data generated .

We mainly talk about the report of automation, but before that we need to have "a lot of data" support, repeatedly referred to "a lot of data," which refers to data in a variety of products we use, during the operation to save the data in the database, of course, including data logs, the data generated and also provides services for products running in the course of operation of our products, if the data in the database, it may also occur data are stored in various databases, tables, files, they need to have initial business functions carried out a "reasonable" or has already been more than "spit bad" way of distribution.

So, we must first practice report automation is to make a connection service on all products database, and then construct entity with all the tables we are familiar with JAVA, PHP and other languages ​​to complete it? In fact not the case, before this, we first of all have to establish a "data warehouse."

Data warehouse design

Data warehouse design approach are many, here is mainly about a style of design DB-ODS-DW-DM layered design, where the introduction of four words, they constitute a data stream composed of the following

image-20200226203701292

The arrows indicate the flow of data, we can see the data flow is substantially:

  • All over the world: Whether it is in the database, logs, third-party services, data providers, and so on, the original data is no longer all over the world
  • Be tolerant to diversity: data from around the world converge at one point, OBS bear a crucial step - collection
  • Class clusters points: after all, the chaotic accumulation is not easy to use, no matter how open the door to see the chaotic scene first tidy, disaggregated - finishing
  • Specializing in surgery industry: a huge warehouse covering too much information, even if we have not collated the ability to self-digestion, before going to need to find someone special processing information corresponding processing - analysis
  • Him: through a complex collection, collation, analysis later, we finally had something worth, you can start the external output

DB(Data Bases)

Here actually refers to the data layer of our product / service systems, the concept here is to use layers, they may consist of multiple libraries, this layer provides data to support the business up and running, if to do data warehouse, DB data layer also need to provide, by pulling data to support ODS quantities of ETL tools at a certain time.

Note that the data copy is not doing the copy is required to use ETL (Extract-Transform-Load) tools, ETL refers to data extraction, transformation, loading, can be configured to operate behavior data from source to destination via the ETL tool and timing of the trigger, this later It has been introduced.

OBS(Operational Data Store)

ODS (Operational Data Store), operational data, a transition layer / intermediate layer as a database DB to the data warehouse and DW.

In fact, do not report automation obtain data directly from the portfolio, is the need to build a data warehouse, data warehouse and the first step is the first portfolio of a variety of data all have to copy a "middle layer the OBS ", all subsequent data warehouse through" the intermediate layer the OBS "acquire data, such operation has the following advantages:

  • Isolation: The business systems and data warehouses isolation
  • Redundancy: This is the first redundant data, of course, could be a backup, through a large database covering all business data
  • Aggregation: direct break business barriers, will bring together all the data together, thinking at the time of follow-up data carding favor, can allow himself to play all kinds of imagination
  • Reduce business system pressure: This is the redundancy of another object, either a report or data mining automation, are likely to produce a large number of operations on the database, the data analysis and business separate, so that affected both performance and operational databases only the need to provide one-time support to read

This layer is a layer unified, its main function is to synchronize all of the plurality of source library content to this library, ETL tool here four pull data, taking into account the principle of single responsibility, this layer is not general data structure It will be changed often and structure of each table source libraries remain consistent.

This layer since it is the same for summary data, then the data is written only by ETL tool, this time they encountered the contents of the source database has changed how do? Every copy of the data to duplicate what to do?

  • If you need to keep a copy of every sample, then consider adding a time field for each table, to mark the contents of each copy, the above-described two problems can be solved by differentiating this field.

     

  • If you have limited resources, and the changes do not care for history, only the current state, then each copy can be considered within the OBS table to delete information before the number of ranks, and then pull

  • If you have limited resources, and the amount of data is very big, and want a quick copy, you can now consider adding some conditions to obtain the source data, the same data will be filtered out, for example, if updated once a day, you only need to get created time or updated time is today, and then upsert OBS correspond to the table

DW (Data Warehouse)

Speaking of data warehouse data warehouse layer. . . .

Data warehouse, warehouse We can imagine when it comes to the reality of today's warehouse look, in fact, the data warehouse is similar.

Front ODS unconditional acceptance of all the information, maybe the only thing he can do is be reasonable table is a data table for each identified portfolio from which came the name, simply a super large space of chaos, and the main warehouse do is "regular."

Modeling data warehouse ( "finishing") has a variety of methods, modeling paradigms Act (Third Normal Form, 3NF), dimensional modeling method (Dimensional Modeling), solid modeling method (Entity Modeling) ......

The ultimate goal is to organize a variety of ways:

  • Readability: something chaotic never know what they are, we can clearly understand the relationship between them after finishing. Dimensional modeling method would be constructed such fact and dimension tables, each table is a fact recorded main data service, comprising a plurality of inner dimension and fact tables id, can be further obtained by a more detailed dimension table id information, such as time series of dates, date, day of week, another example of the type series type Chinese name, detailed descriptions and so on. It provides details on the specific circumstances of dimensional modeling.
  • Category clear: skelter through the OBS layer after rubbing all the data together, to the DW layer we have a new chance to be the relationship between the data link, a valuable constitute a data field, of course, dividing it may just be distinguished by the table name, it may not be sub-libraries.
  • Break business barriers: classification here is mainly from the point of view of data that can break the existing barriers to business.
  • High quality: the data refresh time, will be useful to retain, completely ineffective have abandoned the fields can be thrown away. At the same time we are also in a high-quality building data warehouses to ensure that we can delete the field completely useless at this stage, but more important is the detailed validated information is correct and comprehensive.
  • High efficiency: After a clear classification, re-classification based on the relationship between the data portfolio and rigorous treatment of each piece of data, we have a high-quality, legible, clear classification of data warehouse, so that we can more quickly find us I want something

Note that the data warehouse is only warehouse stores all kinds of things, it is very clean, but it does not completely dig out the value of their existence within the data mining needs to step up the next layer.

DM(Data Mart)

Data marts (Data Mart), also known as market data, data mart is to meet the specific needs of the department or user, stored in accordance with a multidimensional way, including custom dimensions, metrics need to be calculated, hierarchical dimensions, etc., to generate decision-oriented data analysis needs of the cube.

First explain the previous figure in why so many of DM database: DM can actually be divided according to business sector, different departments / services / products, can by its own proprietary database.

Secondly, there is talk about what was in:

DM Curry are often installed on some of the content of external output directly, such as final can quickly provide data query service for the report, but also such a variety of indicators.

What is an indicator? A more detailed description on year growth rate, total monthly sales, conversion rates, average profit margin, the total number of hits a day ...... concrete behind

image-20200226211739979

In fact, the data warehouse is not simply layers, such as the figure I "probably", describes ( "imaginary") a more complex model of a data warehouse.

Stratification purposes, nothing less than the hope their duties, collection, collation, analysis, our input is pulled over each data source information, after gulping the OBS, DW modeling induction, the final layer of our direct foreign DM provide more valuable data.

This article describes a simple hierarchical design of a data warehouse, mainly referred to the OBS layer, DW layer, DM layer, here too, said the concept behind began to throw the details. For example, let's talk about the dimensions behind DW layer modeling method of modeling.

 

| Copyright: Site articles using  CC 4.0 BY-SA agreement  to license, reproduced, please attach the original source link and this statement.
| This link:  Cologic Blog  -  Report Automation: open the door to data warehouse  -  https://www.coologic.cn/2020/02/1756/

Guess you like

Origin www.cnblogs.com/techiel/p/12535091.html