Explain the difference between data warehouse and database

One, data warehouse

  1. What is a data warehouse?
    Data Warehouse (Data Warehouse), which can be abbreviated as DW or DWH, is for the decision-making and planning process at all levels of the enterprise and provides a strategic collection of all types of data types. It was created for analytical reporting and decision support purposes. For companies that need business intelligence, for the need to guide business process improvement, monitoring time, cost, quality and control, etc.;
  2. What can a data warehouse do? (Cite a few chestnuts)

    1. The formulation of annual sales targets requires decision-making based on past historical reports and cannot be made casually.
    2. Optimization of business processes
      such as: a business platform for a brand of mobile phones in the past five years of age to purchase major crowd at what age, in that season purchases more people, so that you can set them as the target population according to the characteristics The main demand and production generated by dynamic allocation, and warehouse inventory.
  3. Characteristics of data warehouse

    1. The data warehouse is subject-oriented.
      1. Unlike traditional databases, data warehouses are subject-oriented, so what is a subject? The theme of the homepage is a higher order concept, and it is the object of data integration, classification and analysis in the enterprise information system at a higher level. In a logical sense, he is the analysis object involved in a certain macro analysis field in the enterprise. (Speaking of people: it is the key aspect that users care about when making decisions with data warehouses. A topic is usually related to multiple operational information systems, while the data organization of operational databases is oriented to transaction processing tasks, and each task is isolated from each other. );
    2. The data warehouse is integrated.
      1. The data in the data warehouse is extracted from the original scattered database data (relational databases such as mysql). Operational databases are quite different from DSS (Decision Support System) analytical databases. First, the source data corresponding to each theme of the data warehouse has many repetitions and differences in all the scattered databases, and the data from different online systems are bundled with different application logic. ; Second, the comprehensive data in the data warehouse cannot be obtained directly from the original database system. Therefore, before data enters the data warehouse, it must be unified and integrated. This step is the most critical and complex step in the construction of the data warehouse. The work to be dug is:
        1. All inconsistencies in the source data should be counted, such as disagreements with the same name, synonymous with different names, inconsistent units, and inconsistent word lengths.
        2. Perform data integration and calculation. The data integration work in the data warehouse can be generated when the original database extracts data, but many are generated inside the data warehouse, that is, integrated after entering the data warehouse.
    3. The data in the data warehouse changes over time.
      1. The non-updatable data in the data warehouse is for applications, that is, users of the data warehouse do not perform data update operations for analysis and processing. But it does not mean that all data in the data warehouse will never change during the entire life cycle from the beginning of data integration into the data warehouse to the end being deleted.
      2. Data in a data warehouse changes over time, which is one of the characteristics of a data warehouse. This feature mainly has the following three manifestations:
        1. The data warehouse continues to add new data content over time. The data warehouse system must continuously capture the changed data in the OLTP database and append it to the data warehouse, that is, continuously generate snapshots of the OLTP database, and add them to the data warehouse through unified integration; but for database snapshots that do not change, If new change data is captured, only a new database snapshot will be generated and added, and the original database snapshot will not be modified.
        2. The database continuously deletes old data content as time changes. The data in the data warehouse also has a storage period. Once this period has passed, the expired data will be deleted. It's just that the data time limit in the database is much longer than the data time limit in an operational environment. Generally, only 60 to 90 days of data are stored in an operational environment, while data with a longer period of time (for example: 5 to 10 years) needs to be stored in a data warehouse to meet the requirements of DSS for trend analysis.
        3. The data warehouse contains a large amount of comprehensive data, many of which are related to time. For example, the data is often integrated according to time periods, or sampled at certain time slices. These data should be continuously re-integrated as time changes. Therefore, the data characteristics of the data warehouse include time items to indicate the historical period of the data.
    4. The data in the data warehouse cannot be modified.
      1. The data in the data warehouse is mainly used for enterprise decision analysis. The data operations involved are mainly data queries, and generally no modification operations are performed. The data in the data warehouse reflects the content of historical data over a long period of time. It is a collection of database snapshots at different points in time, as well as exported data based on these snapshots for statistics, synthesis, and reorganization, rather than online processing data. The online book library in the database is integrated into the data warehouse. Once the data stored in the data warehouse has exceeded the data storage period of the data warehouse, these data will be deleted from the current data warehouse. Because the data warehouse only performs data query operations, the system in the data warehouse is much simpler than the system in the database. Many technical difficulties in the database management system, such as integrity protection, concurrency control, etc., can be almost eliminated in the management of the data warehouse. However, because the amount of query data in a data warehouse is often large, higher requirements are put forward for data query. He requires the use of various complex indexing techniques; at the same time, the data warehouse is for the senior management of commercial enterprises, and they will The user-friendliness of data query and data representation put forward higher requirements;

Two, the difference between data warehouse and database

  1. Before we want to understand the difference, we need to understand three concepts. What are database software, database and data warehouse?
    1. Database software: is a kind of software (not a graphical client that links to the database). Used to implement the logical process of the database, belonging to the physical layer.
    2. Database: is a logical concept, used to store data warehouse, realized by database software. The database consists of many tables, the tables are two-dimensional, and there are many fields in a table. The fields are lined up, and the data is written into the table line by line. The database tables are capable of expressing multi-dimensional relationships in two dimensions. Such as: oracle, DB2, MySQL, Sybase, MSSQL Server, etc.
    3. Data warehouse: It is an upgrade of the database concept. From a logical point of view, there is no difference between a database and a data warehouse. Both are places where data is stored through database software, but in terms of data volume, a data warehouse is much larger than a database. The data warehouse is mainly used for data mining and data analysis to assist leaders in making decisions;
    4. In the IT architecture system, the database must exist, and there must be a place to store the data. For example, online shopping and other e-commerce companies. How many items are in stock, the price of the items, the user’s account balance, etc. These data are stored in the back-end database. Or the simplest understanding, our current WeChat, Weibo and QQ accounts and passwords. The database in the background must be a user table. Is there at least two fields, namely username and password, and then our data is stored in the table line by line. When we log in, we fill in the user name and password, and these data will be sent back to the station to match the data on the table. If the matching is successful, you can log in. If the match is unsuccessful, an error will be reported. This is the database, which is used to work in the production environment. We use databases for all business-related applications.
    5. Data warehouse is one of the technologies under BI. Because the database is linked to business applications, it is impossible for a database to hold all the data of a company. The database table design is often designed for a certain application. For example, in the login function just now, there are only these two fields on this user table, and there are no other fields. At that time, the table is in line with it and there is no problem, but the table does not meet the analysis. For example, I want to know in which time period the number of users is the most? Which user makes the most purchases a year? Such indicators. It is necessary to redesign the table structure of the database. For data analysis and data mining, we introduced the concept of data warehouse. The table structure of the data warehouse is designed in accordance with analysis requirements, analysis dimensions, and analysis indicators.
    6. The difference between database and data warehouse is actually the difference between OLTP and OLAP.
      1. Operational processing, called OLTP (On-Line Transaction Processing), can also be called transaction-oriented processing system, which is the daily operation of specific business online in the database, usually querying and modifying a few records. Users are more concerned about the response time of operations, data security, integrity, and the number of concurrent support users. As the main means of data management, traditional database systems are mainly used for operational processing.
      2. Analytical processing, called On-Line Analytical Processing (OLAP), generally analyzes historical data on certain topics to support management decision-making.
Operational processing Analytical processing
Detailed Synthetic or refined
Entity-Relationship (ER) Model Star model or snowflake model
Store instant data Store historical data, excluding recent data
Updatable Read only, append only
Operate one unit at a time Operate one collection at a time
High performance requirements, short response time Loose performance requirements
Transaction-oriented Analysis-oriented
Small amount of data in one operation Support decision needs
Small amount of data Big amount of data
Customer order, inventory level and bank account inquiry Customer benefit analysis, market segmentation

3. Tail

1. If you have any mistakes, please point it out and I will correct them in time. If you don't understand, you can leave a message to ask questions and communicate with each other.
2. Maybe everyone thinks this is nothing, but I will take it seriously and treat it as my notes and experience so that I can improve myself.

Welcome to my csdn blog

One, data warehouse

  1. What is a data warehouse?
    Data Warehouse (Data Warehouse), which can be abbreviated as DW or DWH, is for the decision-making and planning process at all levels of the enterprise and provides a strategic collection of all types of data types. It was created for analytical reporting and decision support purposes. For companies that need business intelligence, for the need to guide business process improvement, monitoring time, cost, quality and control, etc.;
  2. What can a data warehouse do? (Cite a few chestnuts)

    1. The formulation of annual sales targets requires decision-making based on past historical reports and cannot be made casually.
    2. Optimization of business processes
      such as: a business platform for a brand of mobile phones in the past five years of age to purchase major crowd at what age, in that season purchases more people, so that you can set them as the target population according to the characteristics The main demand and production generated by dynamic allocation, and warehouse inventory.
  3. Characteristics of data warehouse

    1. The data warehouse is subject-oriented.
      1. Unlike traditional databases, data warehouses are subject-oriented, so what is a subject? The theme of the homepage is a higher order concept, and it is the object of data integration, classification and analysis in the enterprise information system at a higher level. In a logical sense, he is the analysis object involved in a certain macro analysis field in the enterprise. (Speaking of people: it is the key aspect that users care about when making decisions with data warehouses. A topic is usually related to multiple operational information systems, while the data organization of operational databases is oriented to transaction processing tasks, and each task is isolated from each other. );
    2. The data warehouse is integrated.
      1. The data in the data warehouse is extracted from the original scattered database data (relational databases such as mysql). Operational databases are quite different from DSS (Decision Support System) analytical databases. First, the source data corresponding to each theme of the data warehouse has many repetitions and differences in all the scattered databases, and the data from different online systems are bundled with different application logic. ; Second, the comprehensive data in the data warehouse cannot be obtained directly from the original database system. Therefore, before data enters the data warehouse, it must be unified and integrated. This step is the most critical and complex step in the construction of the data warehouse. The work to be dug is:
        1. All inconsistencies in the source data should be counted, such as disagreements with the same name, synonymous with different names, inconsistent units, and inconsistent word lengths.
        2. Perform data integration and calculation. The data integration work in the data warehouse can be generated when the original database extracts data, but many are generated inside the data warehouse, that is, integrated after entering the data warehouse.
    3. The data in the data warehouse changes over time.
      1. The non-updatable data in the data warehouse is for applications, that is, users of the data warehouse do not perform data update operations for analysis and processing. But it does not mean that all data in the data warehouse will never change during the entire life cycle from the beginning of data integration into the data warehouse to the end being deleted.
      2. Data in a data warehouse changes over time, which is one of the characteristics of a data warehouse. This feature mainly has the following three manifestations:
        1. The data warehouse continues to add new data content over time. The data warehouse system must continuously capture the changed data in the OLTP database and append it to the data warehouse, that is, continuously generate snapshots of the OLTP database, and add them to the data warehouse through unified integration; but for database snapshots that do not change, If new change data is captured, only a new database snapshot will be generated and added, and the original database snapshot will not be modified.
        2. The database continuously deletes old data content as time changes. The data in the data warehouse also has a storage period. Once this period has passed, the expired data will be deleted. It's just that the data time limit in the database is much longer than the data time limit in an operational environment. Generally, only 60 to 90 days of data are stored in an operational environment, while data with a longer period of time (for example: 5 to 10 years) needs to be stored in a data warehouse to meet the requirements of DSS for trend analysis.
        3. The data warehouse contains a large amount of comprehensive data, many of which are related to time. For example, the data is often integrated according to time periods, or sampled at certain time slices. These data should be continuously re-integrated as time changes. Therefore, the data characteristics of the data warehouse include time items to indicate the historical period of the data.
    4. The data in the data warehouse cannot be modified.
      1. The data in the data warehouse is mainly used for enterprise decision analysis. The data operations involved are mainly data queries, and generally no modification operations are performed. The data in the data warehouse reflects the content of historical data over a long period of time. It is a collection of database snapshots at different points in time, as well as exported data based on these snapshots for statistics, synthesis, and reorganization, rather than online processing data. The online book library in the database is integrated into the data warehouse. Once the data stored in the data warehouse has exceeded the data storage period of the data warehouse, these data will be deleted from the current data warehouse. Because the data warehouse only performs data query operations, the system in the data warehouse is much simpler than the system in the database. Many technical difficulties in the database management system, such as integrity protection, concurrency control, etc., can be almost eliminated in the management of the data warehouse. However, because the amount of query data in a data warehouse is often large, higher requirements are put forward for data query. He requires the use of various complex indexing techniques; at the same time, the data warehouse is for the senior management of commercial enterprises, and they will The user-friendliness of data query and data representation put forward higher requirements;

Two, the difference between data warehouse and database

  1. Before we want to understand the difference, we need to understand three concepts. What are database software, database and data warehouse?
    1. Database software: is a kind of software (not a graphical client that links to the database). Used to implement the logical process of the database, belonging to the physical layer.
    2. Database: is a logical concept, used to store data warehouse, realized by database software. The database consists of many tables, the tables are two-dimensional, and there are many fields in a table. The fields are lined up, and the data is written into the table line by line. The database tables are capable of expressing multi-dimensional relationships in two dimensions. Such as: oracle, DB2, MySQL, Sybase, MSSQL Server, etc.
    3. Data warehouse: It is an upgrade of the database concept. From a logical point of view, there is no difference between a database and a data warehouse. Both are places where data is stored through database software, but in terms of data volume, a data warehouse is much larger than a database. The data warehouse is mainly used for data mining and data analysis to assist leaders in making decisions;
    4. In the IT architecture system, the database must exist, and there must be a place to store the data. For example, online shopping and other e-commerce companies. How many items are in stock, the price of the items, the user’s account balance, etc. These data are stored in the back-end database. Or the simplest understanding, our current WeChat, Weibo and QQ accounts and passwords. The database in the background must be a user table. Is there at least two fields, namely username and password, and then our data is stored in the table line by line. When we log in, we fill in the user name and password, and these data will be sent back to the station to match the data on the table. If the matching is successful, you can log in. If the match is unsuccessful, an error will be reported. This is the database, which is used to work in the production environment. We use databases for all business-related applications.
    5. Data warehouse is one of the technologies under BI. Because the database is linked to business applications, it is impossible for a database to hold all the data of a company. The database table design is often designed for a certain application. For example, in the login function just now, there are only these two fields on this user table, and there are no other fields. At that time, the table is in line with it and there is no problem, but the table does not meet the analysis. For example, I want to know in which time period the number of users is the most? Which user makes the most purchases a year? Such indicators. It is necessary to redesign the table structure of the database. For data analysis and data mining, we introduced the concept of data warehouse. The table structure of the data warehouse is designed in accordance with analysis requirements, analysis dimensions, and analysis indicators.
    6. The difference between database and data warehouse is actually the difference between OLTP and OLAP.
      1. Operational processing, called OLTP (On-Line Transaction Processing), can also be called transaction-oriented processing system, which is the daily operation of specific business online in the database, usually querying and modifying a few records. Users are more concerned about the response time of operations, data security, integrity, and the number of concurrent support users. As the main means of data management, traditional database systems are mainly used for operational processing.
      2. Analytical processing, called On-Line Analytical Processing (OLAP), generally analyzes historical data on certain topics to support management decision-making.
Operational processing Analytical processing
Detailed Synthetic or refined
Entity-Relationship (ER) Model Star model or snowflake model
Store instant data Store historical data, excluding recent data
Updatable Read only, append only
Operate one unit at a time Operate one collection at a time
High performance requirements, short response time Loose performance requirements
Transaction-oriented Analysis-oriented
Small amount of data in one operation Support decision needs
Small amount of data Big amount of data
Customer order, inventory level and bank account inquiry Customer benefit analysis, market segmentation

3. Tail

1. If you have any mistakes, please point it out and I will correct them in time. If you don't understand, you can leave a message to ask questions and communicate with each other.
2. Maybe everyone thinks this is nothing, but I will take it seriously and treat it as my notes and experience so that I can improve myself.

Welcome to my csdn blog

Guess you like

Origin blog.csdn.net/qq_37823979/article/details/108737198