Data Mining (7.1)--Data Warehouse

Table of contents

introduction

1. Database

1 Introduction

2. Database Management System (DBMS)

2. Data Warehouse

Data Warehouse Characteristics

The role of data warehouse

Data Warehouse vs DBMS

Separate data warehouse and database


introduction

The history of data warehousing can be traced back to the 1960s. At that time, the main work in the computer field was to create a single application running on the master file. These applications were characterized by report processing and programs, generally using early programming languages ​​​​such as Fortran or COBOL. write. The main file is stored on cheap magnetic tape, which has the disadvantage that it can only be accessed sequentially. By the 1970s, with the development of computer technology, database management systems (DBMS) began to appear to manage data and improve access efficiency.
In 1975, Sperry Univac launched MAPPER, a database management and reporting system including 4GL, the world's first platform designed for building information centers, a precursor to contemporary data warehouse technology. By the 1980s, with the emergence of more novel technologies such as personal computers (PCs) and fourth-generation programming languages ​​(4GLs), the concept of data warehouses began to emerge.

1. Database

1 Introduction

The database is generated due to the needs of data processing. For example, in the late 1960s, the United States collected all kinds of intelligence for the needs of the war and stored them in computers. This is the origin of the database. With the development of computer technology, the database has developed from the file system stage to the database stage, and then to the advanced database stage. Now, databases have been widely used in practical applications, computer technology and network technology, such as distributed databases, object-oriented databases and network databases.

Data: the basic objects stored in the database, symbolic records used to describe things

Database: A file system that stores data in a structured way

A database is made up of tables, a table is made up of records, and a record is made up of fields

①: domain

②: record

③: Data

2. Database Management System (DBMS)

A software system that enables users to define, create and maintain databases and provides controlled access to databases.

例如:DB2, Oracle, MS SQL Server, MySQL, MS Access

Important functions of DBMS:

Data storage, retrieval (SQL), and updating (create/insert, read, update, delete)

Transaction support, ensuring that all updates are done or not for a given transaction

Concurrency control service to ensure that the database is updated correctly when multiple users update the database at the same time

2. Data Warehouse

A data warehouse is a semantically consistent warehouse that can be used as a physical implementation of a decision support data model to store the information required by an enterprise to make strategic decisions. A data warehouse is viewed as a building constructed by integrating data from multiple heterogeneous sources to support structured or ad hoc queries, analysis reports and decision making.

Data Warehouse Characteristics

A data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data that supports management's decision-making process.

subject-oriented

(1) around important topics or topics, such as customers, products and sales.
(2) Focus on the data modeling and analysis of decision makers, rather than the daily operation or transaction processing of data.
(3) Provide a succinct opinion around a specific topic by excluding data that is not useful for the decision support process.

Integrated

(1) The establishment of a data warehouse is through the integration and integration of multiple different heterogeneous data sources, including relational databases, data files, and online transaction records.
(2) During the establishment of the data warehouse, data cleaning and data integration techniques are applied. Its purpose is to ensure the consistency of data in terms of naming rules, coding structure and attribute measurement when integrating different data sources. In addition, when the data is put into the data warehouse, the data often undergoes a certain transformation.

time-varying

(1) At the time level, the data in the data warehouse is obviously longer than the data in the operational database. It is manifested that the data in the operational database often stores current data, while the data warehouse is based on historical data. The angle provides data. For example, the data stored in the data warehouse is the data between 5 and 10 years, while the data stored in the operational database is the data of the current time period.
(2) In the data warehouse, the key structures contain time elements explicitly or implicitly. In contrast, in operational databases, key structures do not necessarily contain temporal elements.

non-volatile

(1) The data warehouse physically stores data separately, and these data come from the operational database. In the most extreme case, if the data in the data warehouse is damaged, it can also be restored through the data information in the operational database.
(2) In a data warehouse, common operations such as updating data do not occur. In addition, the data warehouse does not require operations such as transaction processing, recovery, and concurrency control mechanisms. There are only two types of data manipulation in a data warehouse: initial loading of data and access to data.

The role of data warehouse

Increase customer focus

  • Purchase patterns, purchase preferences

Fine-tuning the production strategy

  • Reconfigure products and manage portfolios

Analyze business operations and find sources of profit

Manage customer relationships

Data Warehouse vs DBMS

OLTP (On-Line Transaction Processing, online transaction processing): The main task of traditional relational DBMS. Daily Operations: Purchasing, Inventory, Finance, Manufacturing, Payroll, Registration, Accounting, etc.

OLAP (On-Line Analytical Processing, online analytical processing): the main tasks of the data warehouse system, data analysis and decision-making

The main difference between OLTP and OLAP

(1) Processing objects:

  • OLTP is customer-oriented, providing customers with operations such as transaction processing and query processing;
  • OLAP is market-oriented and provides data analysis support for data analysts.

(2) Data content:

  • The data processed by OLTP is the current detailed data;
  • The data processed by OLAP is historical data, which is merged and integrated.

(3) Database design:

  • The OLTP system adopts the "entity relationship" model, which is the data model of the ER diagram and application-oriented data design;
  • OLAP tends to adopt star schema and subject-oriented database design.

(4) View:

  • OLTP focuses on current and local data, not historical data information;
  • The data that OLAP focuses on is the data information integrated from different evolutions and different data sources.

(5) Access mode:

  • The access mode in OLTP includes operations such as updating and querying data, which requires parallel control and recovery mechanisms;
  • The data access mode of OLAP is mainly read-only operations, and most of these read operations are relatively complex query operations.

Separate data warehouse and database

Improve performance of both systems

  • DBMS - OLTP (query, concurrency control, recovery)
  • Data Warehouse - OLAP (Complex OLAP Queries)

Different functions and different data

  • Decision support requires historical data, but the business database does not save

Guess you like

Origin blog.csdn.net/weixin_53197693/article/details/131161202