Data Warehouse for Big Data Development

concept

Data warehouse: Data warehouse DW, a data system for storage, analysis, and reporting; the purpose is to build an analysis -oriented integrated data environment, and the analysis results provide decision support for enterprises

analysis-oriented

  • The data warehouse itself does not generate task data, and its data comes from various external systems
  • The data warehouse does not need external data for tasks, it just organizes the collected data and provides them for external use

insert image description here

The difference between data warehouse and OLTP

When faced with a large amount of data, we often use OLTP databases

In most OLTP, it is business-oriented and supports transactions, and there is still great pressure on the analysis of large amounts of data;

In addition, most of the different business system data are stored in different databases and tables, the field types are not consistent, and the association processing is troublesome;

Data Warehouse Features

Subject-oriented: a relatively abstract concept, which can be understood as a latitude, an abstraction on the data after the classification of a business module

Integration: The data of a subject may be published in different application systems, each system is stored independently, and the data is scattered, which needs to be integrated into the data warehouse comprehensively;

Non-volatile: pull data from other systems, analyze data, and do not create new data

Time-varying: the data in the data warehouse needs to be updated over time to meet the needs of decision-making

Data Warehouse Development Language

As mentioned earlier, the data warehouse is mainly for analyzing data. As long as your language can read and process data, it can be used as the development language of the data warehouse;

For example, C, Java, and Python can all be used as languages ​​for data warehouse development, mainly because of learning costs. When it comes to learning costs, SQL language should be relatively simple, so now the main development language in the field of data analysis is SQL. ;

For example, the HIVE introduced later is to convert the SQL we wrote into a Map Reduce task and run it on the Hadoop cluster to process big data query analysis;

Process big data query analysis on Hadoop cluster;

Guess you like

Origin blog.csdn.net/weixin_44244088/article/details/126078981