Data warehouse knowledge comb (1)

In recent years, with the "big data", "data-driven", "data sets", etc. The concept enthusiastically participated in the Internet community, understand the data acquisition, processing recommendation algorithm to model predictions and other personnel have also been favorable. The perception that these skill areas, with the big data era came. In fact, as early as 80 years of the last century to the early 90's data warehouse and data decision support system concept has been proposed, in essence, all the data together from multiple sources, the use of statistical methods for data analysis to support enterprises various decision.

Now a new name, we can guide the practice of big data in data warehouse project through knowledge. The history and background of this paper, the data warehouse will be introduced as the "data warehouse sort of knowledge," the first article in this series.

01 development of data storage

In a narrow sense, the data warehouse is also a form of data storage. Since the emergence of computer, digital data storage has gone through the following stages.

  • 1950s
    punch cards (Punch cards), the first data storage medium
  • 1960s
    magnetic storage (tapes, disks)
  • 1970s
    The first database management software (IMS), hierarchical databases
    DBTG, network database
  • Early 1980s
    relational data model, RDBMS implementation
  • The late 1980s -1990s
    data mining, data warehouse (WH Inmon)
    95-year data warehouse popular: IBM's dw program, Oracle / SQL Server OLAP Services bindings
  • 2000s
    with the growth of online data, as part of BI solutions
    in conjunction with Big Data, NoSQL system

02 enterprise-level decision-making

Enterprise data warehouse deployment, its purpose is to use the data to illustrate the current situation, and to provide support for the next action plan. Corporate decision-making can be divided into three levels, as shown below.

There may be underlying the operator can perform, such as a long-term electricity supplier logistics orders "issued" status may be lost pieces abnormal, you need to contact the logistics for processing. Sales forecasting the middle, the need to record the history of sales based on the sales forecast for some time, this level may be used within a particular sector. The new top-level recognition or shop location and other market action, we need to consider the entire company's data and even external data, such as the combination of geographic, demographic and economic data together and make decisions.

Therefore, when building a data warehouse or data in the table, not to say that the more complex the system, the better, the key is to see how kind of need to support the decision-making level. Followed by even considering the implementation of various aspects of the factors difficulty and construction period and so on.

Limit 03 database technology

On a section of "abnormal order processing", given the sheer time-out alerts, can be implemented directly in the relational database. However, the top decision-making, development may encounter the following three questions on the relational database.

1. Performance limitations:
1. can not guarantee transactions and BI decision-making type of statistical inquiry;

2. The degree of integration is not enough:
1.C / S, B / S architecture, database-independent service-specific applications, data is dispersed
as well as external data sources of data 2.
Name 3. between various systems, unified caliber unit

3. The lack of methodological tools
to optimize query statistics 1.
Methodology 2. Data modeling
3. Supporting statistics query and statistical analysis tools

For these reasons, data warehouse and business have different underlying database design.

Defined 06 Data Warehouse

A data warehouse is a subject-oriented, integrated, nonvolatile, time-varying personnel to support management decision-making data collection.

- "Data Warehouse (4th Edition)"

  • Subject-oriented, it refers to the analysis of the object corresponding to a particular enterprise involved in the field of macroeconomic analysis

    • For example: "Sales Analysis" is a field of analysis

    • This analysis object "Sales Analysis" involved commodities, suppliers, customers, warehouses, etc., then the number of positions themes can be identified as the commodity theme, theme suppliers, customers theme, warehouse theme

    • Data level, there may be overlapping relationships between data theme

  • integrated

    • Data from the plurality of heterogeneous data sources
    • Standardized Data Integration
  • Nonvolatile

    • Data storage and data source separation
    • Once the data is written to the data warehouse is not updated
    • Initial data warehouse only supports data loading and access
  • Change with time

    • Keep historical data (snapshot data)

    • Data warehouse comprising the elements of time (recording time stamp)

    • Data addition mode by varying the data at different times to achieve

05 Data modeling

Now that the data warehouse most basic function is to store data, how data storage is the next problem. Data modeling data storage that is designed, the current mainstream way for dimensional modeling, relatively speaking business database usually concerned with modeling the way.

The figure above, the left is the service database model, orders, customers, products and so on for the table data and business entities, operational systems and new inquiries, reducing sql queries. Generally in line with "pattern" modeling requirements.

On the right is the number of warehouse star schema to a sale subject, by a fact tables, dimension tables docking other information. It features small data redundancy, a large number of attributes are stored in dimension table, clear structure, easy to use tools for data analysis related. Want to mention here, industry-standard data warehouse query language is in fact not sql, but mdx, would alone behind the article about mdx under tools and simple grammar.

06 architecture

Data warehouse architecture design top-down and bottom build up constructed in two ways.

Top-down approach in which the need to build overall demand from companies began to design the overall data model, and will bring together all the necessary data ETL into a corresponding model object, which is a unified way. Finally, through a specific set of permissions to access data provided to enterprises following sectors.

Bottom-up approach can get specific topics in small-scale sub-sector data model, the establishment of sector-specific services in the data warehouse, they are also called data marts.

As for how to choose their own architecture, it can be considered from the following two aspects.

  • KING

    • Project Risk: top-down approach may take longer period, transfer data between different departments is bound to become Zhi Zhou
    • Business Value: The data are more likely to find a more focused relationship which contains
  • IT section angle

    • Sources of funding policy: a clear project sponsor and the beneficiary of
    • Sources of data: who can provide a clear data

07 summary

This article briefly describes the reasons for the history data storage, data warehousing generated, the definition of data warehouse architecture design, data modeling and digital warehouse.

Welcome to scan two-dimensional code number of public concern

Guess you like

Origin www.cnblogs.com/shenfeng/p/datawarehouse_intro_1.html