Data Warehousing practice topics for - (a) - Overview

[table of Contents]

Data Warehousing practice Zatan (a) Overview

IT applications in high levels of trade or business, trading systems, and data analysis systems are two important parts. For example, in banking, trading system is based on the core business system (Core Banking System) plus the main business systems (credit system, intermediate business systems, payment systems, etc.) and channel system. The data analysis is based on data warehouse system as the core of many analysts, reporting systems, such as general ledger analysis, customer risk, regulatory reporting, management accounting.
Data warehouse technology proposed for many years, in particular the implementation of the project, also had different names, such as the underlying database, data platform, ODS and so on. No matter how called, is to carry out a number of structured data, storage, and application process using the data processing and analysis techniques. With the rise of big data technology and mature, traditional data warehouse and full of glory. In fact, the vast majority of data analysis, are invented in the 80s of the last century before the data mining algorithms. But limited storage capacity, calculate the force level and so on, a lot of technology and did not play its due role. Big data technology, so that greatly enhance the storage and computing power, the operator only fast, but the cost of hardware required to greatly reduce. As long as the count quickly, simply Yili will drop ten.
Big Data technologies to bring development of the industry changes to data management applications. Exemplified by Banks, Bank of information technology development for so many years, in fact, the application of internal data has been very sound, operation and management of various statistical indicators, indicators of good regulatory requirements in the thousands. After the bank was given a large data capacity, what can we do? A few examples:

  • The full amount of data online : As each business platform, all data can be used immediately, and the original data bank expired basic disk, recovery time is a big project.
  • Report dynamic : that all kinds of statistics can only be achieved calculate, see is static, it can interact with the user in the era of big data, such as stress tests, change the parameters of the background to run directly online, quickly gives the result.
  • Attention to process data : bank records and now are basically the result of the use of data, such as the transfer of the trading results, but users achieve the transaction process is actually visit a lot of pages, do a good few steps, processes the data better reflect the user's experience.
  • For more data : access by Internet technology, cooperation or more data precision marketing or risk control, anti-fraud.
  • Unstructured data handling capacity : for example, system log, customer phone records, interaction records, as well as a flood of images, video, etc., can be fully utilized to optimize the system, and tap the potential demand and the risk of the user.

More and more.
We say that a data warehouse or data analysis system, will be referred to OLTP and OLAP categories. Online Trading System (OLTP) and online analytical systems (OLAP) is actually two different ideas and methods. From the perspective of system design, both the difference between:

  • Affairs
    • OLTP systems data generated by online transaction is generally produced, which is a sum of the transaction (such as a bank teller frequent operations).
    • OLAP systems are generally imported in bulk from a trading system, there are so-called real-time / near real-time data warehouse, it can be considered to improve the frequency of batch processing.
  • Data consistency
    • OLTP importance of data consistency, regardless of strong consistency or eventual consistency, consistency must be achieved in distributed systems also need to consider multi-stage affairs.
    • OLAP is basically not consider the transaction, after all, and no longer modify the batch import data, end users are facing inquiries.
  • data structure
    • OLTP system data model (table structure) is generally fixed, reflecting the need to deal with business objects and relationships, changes in the model will inevitably lead to changes in the program.
    • OLAP systems do not care whether a particular service data structure changes, the data structure of the target will be described by metadata (data describing data), to yield the corresponding handler simultaneously driven by the metadata, the data structure is the change in business systems , only need to update the metadata, the system can still work well.
  • Data Manipulation
    • OLTP system operational data typically using object-oriented (OR-Mapping generating entity classes), especially now that a large number of systems are based on java / spring development, the DAO (the Data
      Access Object) technology is very flexible, the structure changed are generated automatically entity class, pole large reduce development and maintenance costs.
    • OLAP operations are large quantities of data, the processing efficiency is the first one, as far as possible using the storage system (e.g., a database) to native primitive operation, and considering whether the parallel processing by the data distribution.
  • Architecture
    • Concerned about the amount of concurrent OLTP processing system (TPS: Transactions per
      SECOND, in response to the number of transactions per second), in order to quickly insert / update data, a variety of distributed architecture is very complex.
    • OLAP there QPS (Queries Per Second
      query rate per second), but after all, the system does not involve changes to the data, independent of the transaction, so basically just need to queries shunt (load balancing). The focus is more efficient workflow processing and batch processing of large quantities of data, and reduce user time by batch processing request (Ad
      Hoc, ad hoc queries) real-time computation.

Each with different types of complex systems. To make a transition from the trading system analysis system in addition to a variety of learning and statistical analysis algorithms addition, changes to the system architecture should focus on learning.
Data analysis or Hadoop big data will not be loaded up to run through a few examples of running MapReduce programs or algorithms so know a few short answer. This is a complete and rigorous system engineering, in addition to familiar with the industry, understanding of the data, there are a lot of techniques to master, including data acquisition, validation, processing, storage, and applications. details as follows:

  • Data tiering;
  • Overall implementation framework;
  • Metadata;
  • ETL;
  • Data validation;
  • Data Standardization
  • Data de-duplication;
  • Processing and storing data increments fastener;
  • Data storage characteristics of the data warehouse;
  • Data model;
  • Dimensional model and slowly changing dimensions;
  • Data Rewind;
  • Reports;
  • Data mining.

We will continue to be described later. This article basically does not involve the use of specific operational tools, and more from the concept, the idea to do on a universal algorithm.

To be continued.

Next: Data Warehousing Practice Zatan - (b) - data tiering

Published 20 original articles · won praise 7 · views 2500

Guess you like

Origin blog.csdn.net/cfy_fantasyxx/article/details/102812931