【数仓】数据仓库高频面试题题英文版(1)

  今天更新数据仓库高频面试题英文版,分为三个部分。下面是第一部分。
音频文件点击下方获取。
【数仓】数据仓库高频面试题题英文版(1)
【数仓】数据仓库高频面试题题英文版(2)
【数仓】数据仓库高频面试题题英文版(3)

What is Data Warehouse?

Data warehousing (DW) is a method of gathering and analysing data from many sources in order to get useful business insights. Typically, a data warehouse is used to integrate and analyse corporate data from many sources. The data warehouse is the heart of the business intelligence (BI) system, which is designed to analyse and report on data.

数据仓库 (DW) 是一种从多个来源收集和分析数据以获得有用的业务洞察力的方法。通常,数据仓库用于集成和分析来自多个来源的公司数据。数据仓库是商业智能 (BI) 系统的核心,旨在分析和报告数据。

It is a collection of technology and components that help with data strategy. It refers to a company’s electronic storage of a huge volume of data that is intended for inquiry and analysis rather than transaction processing. It is a method of converting data into information and making it available to people in a timely manner so that it can be used to make a difference. It is created by combining data from a variety of disparate sources to provide analytical reporting, structured and/or ad hoc queries, and decision-making. Cleaning, integrating, and consolidating data are all part of data warehousing.

它是有助于数据战略的技术和组件的集合。它是指公司以电子方式存储大量数据,用于查询和分析,而不是交易处理。它是一种将数据转换为信息并及时提供给人们的方法,以便可以用来产生影响。它是通过组合来自各种不同来源的数据来创建的,以提供分析报告、结构化和/或即席查询以及决策制定。清理、集成和整合数据都是数据仓库的一部分。

The Data Warehouse is kept distinct from the operational database of the company. It is an environment rather than a product. It is an information system’s architectural design that gives users access to current and historical decision-support data that’s difficult to find or find in a standard operational data store.

数据仓库与公司的运营数据库不同。它是一种环境,而不是一种产品。它是一种信息系统的架构设计,使用户能够访问在标准操作数据存储中很难找到的当前和历史决策支持数据。

For example, a report on current inventory information may have more than 12 connected conditions. This can cause the query and report to take a long time to respond. A data warehouse introduces a novel design that can help to improve query performance and minimise response time for reporting and analytics.

例如,一份关于当前库存信息的报告可能有 12 个以上的关联条件。这可能会导致查询和报告需要很长时间才能响应。数据仓库引入了一种新颖的设计,可以帮助提高查询性能并最大限度地减少报告和分析的响应时间。

The following are some alternative names for the data warehouse system:

  • Decision Support System (DSS).
  • Management Information System.
  • Executive Information System.
  • Analytic Application.
  • Business Intelligence Solution.

以下是数据仓库系统的一些替代名称:

  • 决策支持系统 (DSS)
  • 管理信息系统
  • 行政信息系统
  • 分析应用
  • 商业智能解决方案

Data Warehouse Interview Questions for Freshers

1. What do you mean by data mining? Differentiate between data mining and data warehousing.

Data mining is the process of collecting information in order to find patterns, trends, and usable data that will help a company to make data-driven decisions from large amounts of data. In other words, Data Mining is the method of analysing hidden patterns of data from various perspectives for categorization into useful data, which is gathered and assembled in specific areas such as data warehouses, efficient analysis, data mining algorithm, assisting decision making, and other data requirements, ultimately resulting in cost-cutting and revenue generation. Data mining is the process of automatically examining enormous amounts of data for patterns and trends that go beyond simple analysis. Data mining estimates the probability of future events by utilising advanced mathematical algorithms for data segments.

Following are the differences between data warehousing and data mining:

Data Warehousing Data Mining
A data warehouse is a database system that is intended for analytical rather than transactional purposes. The technique of examining data patterns is known as data mining.
In data warehousing, data is saved on a regular basis. In data mining, data is evaluated on a regular basis.
Engineers are the only ones that do data warehousing. With the assistance of technologists, business users conduct data mining.
Data warehousing is the process of bringing all relevant data together. Data mining is the process of extracting information from big datasets.
Data warehousing can be referred to as a subset of data mining. Data Mining can be referred to as a super set of data warehousing.

1. 数据挖掘是什么意思?区分数据挖掘和数据仓库。

数据挖掘是收集信息以发现模式、趋势和可用数据的过程,这些数据将帮助公司从大量数据中做出数据驱动的决策。换句话说,数据挖掘是从不同的角度分析数据的隐藏模式,分类成有用的数据,在数据仓库、高效分析、数据挖掘算法、辅助决策等特定领域进行收集和组装的方法。数据需求,最终导致成本削减和创收。数据挖掘是自动检查大量数据以寻找超出简单分析的模式和趋势的过程。数据挖掘通过对数据段使用高级数学算法来估计未来事件的概率。

以下是数据仓库和数据挖掘之间的区别:

数据仓库 数据挖掘
数据仓库是用于分析而非事务目的的数据库系统。 检查数据模式的技术称为数据挖掘。
在数据仓库中,数据会定期保存。 在数据挖掘中,数据被定期评估。
工程师是唯一做数据仓库的人。 在技术人员的帮助下,业务用户进行数据挖掘。
数据仓库是将所有相关数据汇集在一起的过程。 数据挖掘是从大数据集中提取信息的过程。
数据仓库可以被称为数据挖掘的一个子集。 数据挖掘可以称为数据仓库的超集。

2. What do you mean by OLAP in the context of data warehousing? What guidelines should be followed while selecting an OLAP system?

OLAP is an acronym for On-Line Analytical Processing. OLAP is a software technology classification that allows analysts, managers, and executives to get insight into information through quick, reliable, interactive access to data that has been converted from raw data to reflect the true dimensionality of the company as perceived by the clients. OLAP allows for multidimensional examination of corporate data while also allowing for complex estimations, trend analysis, and advanced data modelling. It’s rapidly improving the foundation for Intelligent Solutions, which includes Business Performance Management, Strategy, Budgeting, Predicting, Financial Documentation, Analysis, Modeling, Knowledge Discovery, and Data Warehouses Reporting. End-clients can use OLAP to perform ad hoc record analysis in several dimensions, giving them the information and understanding they need to make better choices.

Following guidelines must be followed while selecting an OLAP system:

2. 在数据仓库的上下文中,OLAP 是什么意思?选择 OLAP 系统时应遵循哪些准则?

OLAP 是On-Line Analytical Processing的首字母缩写词. OLAP 是一种软件技术分类,它允许分析师、经理和高管通过快速、可靠、交互式地访问从原始数据转换而来的数据来深入了解信息,以反映客户所感知的公司的真实维度。OLAP 允许对企业数据进行多维检查,同时还允许进行复杂的估计、趋势分析和高级数据建模。它正在迅速改进智能解决方案的基础,包括业务绩效管理、战略、预算、预测、财务文档、分析、建模、知识发现和数据仓库报告。最终客户可以使用 OLAP 在多个维度上执行临时记录分析,为他们提供做出更好选择所需的信息和理解。

选择 OLAP 系统时必须遵循以下准则:

  • Multidimensional Conceptual View: This is one of an OLAP system’s most important capabilities. It is feasible to use methods like slice and dice that require a multidimensional view.
  • Transparency: Make the technology, the underlying data repository, computing operations, and the disparate nature of source data completely accessible to consumers. Users’ efficiency and productivity are improved as a result of this transparency.
  • Accessibility: OLAP systems must only allow access to the data that is truly needed to do the analysis, giving clients a single, coherent, and consistent picture. The OLAP system must map its own logical schema to the disparate physical data storage, as well as to conduct any required transformations.
  • Consistent Reporting Performance: As the number of dimensions or the size of the database grows, users should not experience any substantial reduction in documenting performance. That is, as the number of dimensions grows, OLAP performance should not deteriorate.
  • Client/Server Architecture: Make the OLAP tool’s server component clever enough that the various clients can be connected with minimal effort and integration code. The server should be able to map and consolidate data from disparate databases.
  • Generic Dimensionality: Each dimension in an OLAP method should be seen as equal in terms of structure and operational capabilities. Select dimensions may be granted additional operational capabilities, although such duties should be available to all dimensions.
  • **多维概念视图:**这是 OLAP 系统最重要的功能之一。使用需要多维视图的切片和骰子等方法是可行的。
  • **透明度:**技术、底层数据存储库、计算操作和源数据的不同性质都能被用户访问。这种透明性提高了用户的效率和生产力。
  • 可访问性: OLAP 系统必须只允许访问真正需要进行分析的数据,为客户提供单一、连贯且一致的视图。OLAP 系统必须将其自己的逻辑模式映射到不同的物理数据存储,并执行任何所需的转换。
  • **一致的报告性能:**随着维度数量或数据库大小的增长,用户不应体验到文档性能的任何显着降低。也就是说,随着维数的增加,OLAP 性能应该不会下降。
  • **客户端/服务器架构:**使 OLAP 工具的服务器组件足够聪明,从而可以用最少的工作和集成代码连接各种客户端。服务器应该能够映射和整合来自不同数据库的数据。
  • 通用维度: OLAP 方法中的每个维度在结构和操作能力方面都应该被视为相同的。选择维度可以被授予额外的操作能力,尽管这些职责应该适用于所有维度。
  • Dynamic Sparse Matrix Handling: To optimise sparse matrix handling by adapting the physical schema to the unique analytical model being built and loaded. When confronted with a sparse matrix, the system must be able to dynamically assume the information distribution and change storage and access in order to achieve and maintain a constant level of performance.
  • Multiuser Support: OLAP technologies must allow several users to access data at the same time while maintaining data integrity and security.
  • Unrestricted cross-dimensional Operations: It gives techniques the ability to determine dimensional order and to perform roll-up and drill-down operations within and across dimensions.
  • Intuitive Data Manipulation: Reorientation (pivoting), drill-down and roll-up, and other manipulations can be done intuitively and precisely on the cells of the scientific model using point-and-click and drag-and-drop methods. It does away with the need for a menu or several visits to the user interface.
  • Flexible Reporting: It provides efficiency to corporate clients by allowing them to organize columns, rows, and cells in a way that allows for easy data manipulation, analysis, and synthesis.
  • Infinite Dimensions and Aggregation Levels: There should be no limit to the number of data dimensions. Within any given consolidation path, each of these common dimensions must allow for an almost infinite number of customer-defined aggregation levels.
  • **动态稀疏矩阵处理:**通过使物理模式适应正在构建和加载的独特分析模型来优化稀疏矩阵处理。当面对稀疏矩阵时,系统必须能够动态地假设信息分布并改变存储和访问,以实现并保持恒定的性能水平。
  • 多用户支持: OLAP 技术必须允许多个用户同时访问数据,同时保持数据的完整性和安全性。
  • **不受限制的跨维度操作:**它使技术能够确定维度顺序并在维度内和跨维度执行汇总和向下钻取操作。
  • **直观的数据操作:**重新定向(旋转)、向下钻取和上卷以及其他操作可以使用点击和拖放方法在科学模型的单元格上直观而精确地完成。它消除了对菜单或多次访问用户界面的需要。
  • **灵活的报告:**它允许企业客户以一种便于数据操作、分析和综合的方式组织列、行和单元格,从而提高效率。
  • **无限维度和聚合级别:**数据维度的数量应该没有限制。在任何给定的整合路径中,这些公共维度中的每一个都必须允许几乎无限数量的客户定义的聚合级别。

3. What do you understand about a fact table in the context of a data warehouse? What are the different types of fact tables?

In a Data Warehouse system, a Fact table is simply a table that holds all of the facts or business information that can be exposed to reporting and analysis when needed. Fields that reflect direct facts, as well as foreign fields that connect the fact table to other dimension tables in the Data Warehouse system, are stored in these tables. Depending on the model type used to construct the Data Warehouse, a Data Warehouse system can have one or more fact tables.

Following are the three types of fact tables:

3.您对数据仓库上下文中的事实表了解多少?有哪些不同类型的事实表?

在数据仓库系统中,事实表只是一个包含所有事实或业务信息的表,这些事实或业务信息可以在需要时提供给报告和分析。反映直接事实的字段,以及将事实表连接到数据仓库系统中其他维度表的外部字段,都存储在这些表中。根据用于构建数据仓库的模型类型,数据仓库系统可以有一个或多个事实表。

以下是三种类型的事实表:

  • Transactional Fact Table: This is a very basic and fundamental view of corporate processes. It can be used to depict the occurrence of an event at any given time. The facts measure are only valid at that specific time and for that specific incident. “One row per line in a transaction,” according to the grain associated with the transaction table. It typically comprises data at the detailed level, resulting in a huge number of dimensions linked with it. It captures the smallest or atomic level of dimension measurement. This allows the table to provide users with extensive dimensional grouping, roll-up, and drill-down reporting features. It’s packed yet sparse at the same time. It can also be big at the same time, depending on the number of events (transactions) that have occurred.
  • Snapshot Fact Table: The snapshot depicts the condition of things at a specific point in time, sometimes known as a “picture of the moment.” It usually contains a greater number of non-additive and semi-additive information. It aids in the examination of the company’s overall performance at regular and predictable times. Unlike the transaction fact table, which adds a new row for each occurrence of an event, this represents the performance of an activity at the end of each day, week, month, or any other time interval. However, to retrieve the detailed data in the transaction fact table, snapshot fact tables or periodic snapshots rely on the transaction fact table. The periodic snapshot tables are typically large and take up a lot of space.
  • Accumulating Fact Table: These are used to depict the activity of any process with a well-defined beginning and end. Multiple data stamps are commonly found in accumulating snapshots, which reflect the predictable stages or events that occur over the course of a lifespan. There is sometimes an extra column with the date that indicates when the row was last updated.
  • **事务事实表:**这是公司流程的一个非常基础的视图。它可以用来描述在任何给定时间发生的事件。事实衡量标准仅在特定时间和特定事件中有效。根据与事务表关联的粒度,“每行表示一个事务”。它通常包含详细级别的数据,导致大量维度与之关联。它捕获最小或原子级别的度量。这允许该表为用户提供广泛的维度分组、汇总和下钻报告特征。它既拥挤又稀疏。它也可能同时很大,具体取决于已发生的事件(事务)的数量。
  • **快照事实表:**快照描述了特定时间点的事物状况,有时被称为“当下的快照”。它通常包含更多的非附加和半附加信息。它有助于在定期和可预测的时间检查公司的整体绩效。与事务事实表不同,事务事实表为每个事件的发生添加一个新行,快照事实表表示每天、每周、每月或任何其他时间间隔结束时活动的性能。然而,为了检索事务事实表中的详细数据,快照事实表或周期性快照依赖于事务事实表。定期快照表通常很大并且占用大量空间。
  • **累积事实表:**这些用于描述具有明确定义的开始和结束的任何流程的活动。多个数据戳通常出现在累积快照中,这些快照反映了在生命周期中发生的可预测阶段或事件。有时会有一个额外的列,其中包含指示该行上次更新时间的日期。

4. What do you mean by dimension table in the context of data warehousing? What are the advantages of using a dimension table?

A table in a data warehouse’s star schema is referred to as a dimension table. Dimensional data models, which are made up of fact and dimension tables, are used to create data warehouses. Dimension tables contain dimension keys, values, and attributes and are used to describe dimensions. It is usually of a tiny size. The number of rows might range from a few to thousands. It is a description of the objects in the fact table. The term “dimension table” refers to a collection or group of data pertaining to any quantifiable occurrence. They serve as the foundation for dimensional modelling. It includes a column that serves as a primary key, allowing each dimension row or record to be uniquely identified. Through this key, it is linked to the fact tables. When it’s constructed, a system-generated key called the surrogate key is used to uniquely identify the rows in the dimension.

4. 在数据仓库的上下文中,维表是什么意思?使用维度表有什么好处?

数据仓库星型模式中的表称为维度表。由事实表和维度表组成的维度数据模型用于创建数据仓库。维度表包含维度键、值和属性,用于描述维度。它通常很小。行数可能从几到几千不等。它是对事实表中对象的描述。术语“维表”是指与任何可量化的事件有关的数据集合或数据组。它们是维度建模的基础。它包括一个用作主键的列,允许唯一标识每个维度行或记录。通过这个键,它链接到事实表。构建完成时,系统生成的代理键用于唯一标识维度中的行。

Following are the advantages of using a dimension table :

  • It features a straightforward design.
  • It is simple to study and comprehend.
  • It stores data that has been de-normalized.
  • It aids in the preservation of historical data for any dimension.
  • It’s simple to get info from it.
  • It’s simple to build and put into action.
  • It provides the context for any business operation.

以下是使用维度表的优点:

  • 它具有简单的设计。
  • 它很容易学习和理解。
  • 它存储已反规范化的数据。
  • 它有助于保存任何维度的历史数据。
  • 从中获取信息很简单。
  • 它很容易构建和付诸实施。
  • 它为任何业务操作提供上下文。

5. What are the different types of dimension tables in the context of data warehousing?

Following are the different types of dimension tables in the context of data warehousing:

5. 数据仓库上下文中的维表有哪些不同类型?

以下是数据仓库上下文中不同类型的维度表:

  • Slowly Changing Dimensions (SCD):
    Slowly changing dimensions are dimension attributes that tend to vary slowly over time rather than at a regular period of time. For example, the address and phone number may change, but not on a regular basis. Consider the case of a man who travels to several nations and must change his address according to the place he is visiting. This can be accomplished in one of three ways:
    • Type 1: Replaces the value that was previously entered. This strategy is simple to implement and aids in the reduction of costs by saving space. However, in this circumstance, history is lost.
    • Type 2: Insert a new row containing the new value. This method saves the history and allows it to be accessed at any time. However, it takes up a lot of space, which raises the price.
    • Type 3: Add a new column to the table. It is the ideal strategy because history can be easily preserved.
  • 缓慢变化维 (SCD):
    缓慢变化维是随着时间趋于缓慢变化而不是在固定时间段内变化的维度属性。例如,地址和电话号码可能会更改,但不会定期更改。考虑一个人去几个国家旅行并且必须根据他所访问的地方更改他的地址的情况。这可以通过以下三种方式之一来完成:
    • 类型 1:替换之前输入的值。该策略易于实施,并通过节省空间来帮助降低成本。然而,在这种情况下,历史数据会丢失。
    • 类型 2:插入包含新值的新行。此方法保存历史记录并允许随时访问它。但是,它占用了大量空间,从而增加了成本。
    • 类型 3:向表中添加新列。这是理想的策略,因为历史可以很容易地保存下来。
  • Junk Dimension: A trash dimension is a collection of low-cardinality attributes. It contains a number of varied or disparate features that are unrelated to one another. These can be used to implement RCD (rapidly changing dimension) features like flags and weights, among other things.
  • **垃圾维度:**垃圾维度是低基数属性的集合。它包含许多彼此无关的变化的或完全不同的特征。这些可用于实现 RCD(快速变化的维度)功能,例如标志和权重等。
  • Conformed Dimension: Multiple subject areas or data marts share this dimension. It can be utilised in a variety of projects without requiring any changes. This is used to keep things in order. Dimensions that are exactly the same as or a proper subset of any other dimension are known as conformed dimensions.
  • **一致维度:**多个主题区域或数据集市共享此维度。它可以在各种项目中使用,无需任何更改。这用于使事情井井有条。与任何其他维度完全相同的维度或任何其他维度的适当子集称为一致维度。
  • Roleplay Dimension: Role-play dimension refers to the dimension table that has many relationships with the fact table. In other words, it occurs when the same dimension key and all of its associated attributes are linked to a large number of foreign keys in the fact table. Within the same database, it might serve several roles.
  • **角色扮演维度:**角色扮演维度是指与事实表有很多关系的维度表。换句话说,当相同的维度键及其所有关联属性链接到事实表中的大量外键时,就会发生这种情况。在同一个数据库中,它可能扮演多个角色。
  • Degenerate Dimension: Degenerate dimension attributes are those that are contained in the fact table itself rather than in a separate dimension table. For instance, a ticket number, an invoice number, a transaction number, and so on.
  • **退化维度:**退化维度属性是那些包含在事实表本身而不是单独的维度表中的属性。例如,票号、发票号、交易号等。

6. Differentiate between fact table and dimension table.

The record of a reality or fact table could be made up of attributes from various dimension tables. The Fact Table, also known as the Reality Table, assists the user in investigating the business aspects that aid him in call taking in order to improve his firm. Dimension Tables, on the other hand, make it easier for the reality table or fact table to collect dimensions from which measurements must be taken.

The following table enlists the difference between a fact table and a dimension table:

Fact Table Dimension Table
It contains the attributes’ measurements, facts, or metrics. It is the companion table that has the attributes that the fact table uses to derive the facts.
Data grain (the most atomic level by which facts may be defined) is what defines it. It is detailed, comprehensive, and lengthy.
It is used for analysis and decision-making and contains measures. It contains information regarding a company’s operations and procedures.
It contains information in both numeric and textual formats. It only contains textual information.
It has a primary key that works as a foreign key in the dimension table. It has a foreign key that is linked to the fact table’s primary key.
It stores the filter domain and reports labels in dimension tables. It organizes the atomic data into dimensional structures.
It does not have a hierarchy. It has a hierarchy.
It has lesser attributes than a dimension table. It has more attributes than a fact table.
It has more records as compared to a dimension table. It has fewer records than a fact table.
Here, the table grows vertically. Here, the table grows horizontally.
It is created after the corresponding dimension table has been created. It is created prior to the creation of the fact table.

6.区分事实表和维度表。

事实表的记录可以由来自各种维度表的属性组成。事实表,也称为现实表,帮助用户调查业务的各个方面方面,以改善他的公司。另一方面,维度表使事实表更容易收集必须进行测量的维度。

下表列出了事实表和维度表之间的区别:

事实表 维度表
它包含属性的度量、事实或指标。 它是具有事实表用于派生事实的属性的伴随表。
数据粒度(可以定义事实的最原子级别)是它的定义。 它详细、全面、冗长。
它用于分析和决策,并包含措施。 它包含有关公司运营和程序的信息。
它包含数字和文本格式的信息。 它只包含文本信息。
它有一个主键,在维度表中用作外键。 它有一个外键链接到事实表的主键。
它将过滤器域和报告标签存储在维度表中。 它将原子数据组织成维度结构。
它没有层次结构。 它有一个层次结构。
它的属性比维度表少。 它比事实表具有更多的属性。
与维度表相比,它有更多的记录。 它的记录比事实表少。
在这里,表格垂直增长。 在这里,表格水平增长。
它是在创建相应的维度表之后创建的。 它是在创建事实表之前创建的。

7. What are the advantages of a data warehouse?

Following are the advantages of using a data warehouse:

7. 数据仓库有什么优势?

以下是使用数据仓库的优点:

  • Helps you save time:
    • To stay ahead of your competitors in today’s fast-paced world of cutthroat competition, your company’s ability to make smart judgments quickly is critical.
    • A Data warehouse gives you instant access to all of your essential data, so you and your staff don’t have to worry about missing a deadline. All you have to do now is deploy your data model to start collecting data in a matter of seconds. You can do this with most warehousing solutions without utilising a sophisticated query or machine learning.
    • With data warehousing, your company won’t have to rely on a technical professional to troubleshoot data retrieval issues 24 hours a day, seven days a week. You will save a lot of time this way.
  • 帮助您节省时间:
    • 为了在当今快节奏的残酷竞争世界中领先于竞争对手,贵公司快速做出明智判断的能力至关重要。
    • 数据仓库可让您即时访问所有重要数据,因此您和您的员工不必担心错过最后期限。您现在所要做的就是部署您的数据模型以在几秒钟内开始收集数据。您可以使用大多数仓储解决方案来做到这一点,而无需使用复杂的查询或机器学习。
    • 借助数据仓库,您的公司将不必依赖技术专业人员每周 7 天、每天 24 小时对数据检索问题进行故障排除。通过这种方式,您将节省大量时间。
  • Enhances the quality of data:
    • The high-quality data ensures that your company’s policies are founded on accurate information about your operations.
    • You can turn data from numerous sources into a shared structure using data warehousing. You can assure the consistency and integrity of your company’s data this way. This allows you to spot and eliminate duplicate data, inaccurately reported data and disinformation.
    • For your firm, implementing a data quality management program may be both costly and time-consuming. You can easily use a data warehouse to reduce the number of these annoyances while saving money and increasing the general productivity of your company.
  • 提高数据质量:
    • 高质量的数据可确保您公司的政策建立在有关您运营的准确信息的基础上。
    • 您可以使用数据仓库将来自众多来源的数据转换为共享结构。您可以通过这种方式确保公司数据的一致性和完整性。这使您能够发现并消除重复数据、不准确报告的数据和虚假信息。
    • 对于您的公司而言,实施数据质量管理计划可能既昂贵又耗时。您可以轻松地使用数据仓库来减少这些烦恼的数量,同时节省资金并提高公司的总体生产力。
  • Enhances Business Intelligence (BI):
    • Throughout your commercial endeavours, you can use a data warehouse to gather, absorb, and derive data from any source. As a result of the capacity to easily consolidate data from several sources, your BI will improve by leaps and bounds.
  • 增强商业智能 (BI):
    • 在您的商业活动中,您可以使用数据仓库从任何来源收集、吸收和派生数据。由于能够轻松整合来自多个来源的数据,您的 BI 将得到突飞猛进的改进。
  • Data standardization and Consistency are achieved:
    • The uniformity of huge data is another key benefit of having central data repositories. In a similar manner, a data storage or data mart might benefit your company. Because data warehousing stores data from various sources in a consistent manner, such as a transactional system, each source will produce results that are synchronized with other sources. This ensures that data is of higher quality and homogeneous. As a result, you and your team can rest assured that your data is accurate, resulting in more informed corporate decisions.
  • 实现数据标准化和一致性:
    • 海量数据的一致性是拥有中央数据存储库的另一个关键优势。以类似的方式,数据存储或数据集市可能会使您的公司受益。因为数据仓库以一致的方式存储来自各种来源的数据,例如事务系统,所以每个来源都会产生与其他来源同步的结果。这确保了数据具有更高的质量和同质性。因此,您和您的团队可以放心您的数据是准确的,从而做出更明智的公司决策。
  • Enhances Data Security:
    • A data warehouse improves security by incorporating cutting-edge security features into its design. For any business, consumer data is a vital resource. You can keep all of your data sources integrated and properly protected by adopting a warehousing solution. The risk of a data breach will be greatly reduced as a result of this.
  • 增强数据安全性:
    • 数据仓库通过将尖端的安全功能融入其设计来提高安全性。对于任何企业来说,消费者数据都是至关重要的资源。通过采用仓储解决方案,您可以保持所有数据源的集成和适当保护。因此,数据泄露的风险将大大降低。
  • Ability to store historical data:
    • Because a data warehouse can hold enormous amounts of historical data from operational systems, you can readily study different time periods and inclinations that could be game-changing for your business. You can make better corporate judgments about your business plans if you have the correct facts in your hands.
  • 存储历史数据的能力:
    • 由于数据仓库可以保存来自操作系统的大量历史数据,因此您可以轻松研究可能会改变您的业务游戏规则的不同时间段和倾向。如果您掌握正确的事实,您可以对您的业务计划做出更好的企业判断。

8. What are the disadvantages of using a data warehouse?

Following are the disadvantages of using a data warehouse:-

  • Loading time of data resources is undervalued: We frequently underestimate the time it will take to gather, sanitize, and post data to the warehouse. Although some resources are in place to minimize the time and effort spent on the process, it may require a significant amount of the overall production time.
  • Source system flaws that go unnoticed: After years of non-discovery, hidden flaws linked with the source networks that provide the data warehouse may be discovered. Some fields, for example, may accept nulls when entering new property information, resulting in workers inputting incomplete property data, even if it was available and relevant.
  • Homogenization of data: Data warehousing also deals with data formats that are comparable across diverse data sources. It’s possible that some important data will be lost as a result.

8. 使用数据仓库有什么缺点?

以下是使用数据仓库的缺点:-

  • **数据资源的加载时间被低估:**我们经常低估收集、清理和发布数据到仓库所需的时间。尽管有一些资源可以最大限度地减少在流程上花费的时间和精力,但它可能需要大量的整体生产时间。
  • **未被注意到的源系统缺陷:**经过多年未发现,与提供数据仓库的源网络相关的隐藏缺陷可能会被发现。例如,某些字段在输入新的属性信息时可能会接受空值,从而导致工作人员输入不完整的属性数据,即使它可用且相关。
  • **数据同质化:**数据仓库还处理在不同数据源之间具有可比性的数据格式。结果可能会丢失一些重要数据。

9. What are the different types of data warehouse?

Following are the different types of data warehouse:

9. 数据仓库有哪些不同类型?

以下是不同类型的数据仓库:

  • Enterprise Data Warehouse:
    • An enterprise database is a database that brings together the various functional areas of an organisation in a cohesive manner. It’s a centralised location where all corporate data from various sources and apps can be accessed. They can be utilised for analytics and by everyone in the organisation once they’ve been saved. The data can be categorised by subject, and access is granted according to the necessary division. The tasks of extracting, converting, and conforming are taken care of in an Enterprise Data Warehouse.
    • Enterprise Data Warehouse’s purpose is to provide a comprehensive overview of any object in the data model. This is performed by finding and wrangling the data from different systems. This is then loaded into a model that is consistent and conformed. The data is acquired by Enterprise Data Warehouse, which can provide access to a single site where various tools can be used to execute analytical functions and generate various predictions. New trends or patterns can be identified by research teams, which can then be focused on to help the company expand.
  • 企业数据仓库:
    • 企业数据库是以一种内聚的方式将组织的各个功能领域结合在一起的数据库。它是一个集中位置,可以访问来自各种来源和应用程序的所有公司数据。一旦它们被保存,组织中的每个人都可以用它分析。数据可以按主题分类,并根据必要的划分授予访问权限。提取、转换和一致性的任务在企业数据仓库中处理。
    • 企业数据仓库的目的是提供数据模型中任何对象的全面概述。这是通过查找和整理来自不同系统的数据来执行的。然后将其加载到一致的模型中。数据由企业数据仓库获取,它可以提供对单个站点的访问,在该站点中可以使用各种工具来执行分析功能并生成各种预测。研究团队可以识别新的趋势或模式,然后可以专注于帮助公司扩张。
  • Operational Data Store (ODS):
    • An operational data store is utilised instead of having an operational decision support system application. It facilitates data access directly from the database, as well as transaction processing. By checking the associated business rules, the data in the Operational Data Store may be cleansed, and any redundancy found can be checked and rectified. It also aids in the integration of disparate data from many sources so that business activities, analysis, and reporting may be carried out quickly and effectively while the process is still ongoing.
    • The majority of current operations are stored here before being migrated to the data warehouse for a longer period of time. It is particularly useful for simple searches and little amounts of data. It functions as short-term or temporary memory, storing recent data. The data warehouse keeps data for a long time and also keeps information that is generally permanent.
  • 操作数据存储 (ODS):
    • 使用操作数据存储而不是操作决策支持系统应用程序。它有助于直接从数据库访问数据以及事务处理。通过检查相关的业务规则,可以清理 ODS 中的数据,并且可以检查和纠正发现的任何冗余。它还有助于整合来自多个来源的不同数据,以便在流程仍在进行时快速有效地执行业务活动、分析和报告。
    • 大多数当前操作在迁移到数据仓库较长时间之前都存储在这里。它对于简单的搜索和少量数据特别有用。它用作短期或临时存储器,存储最近的数据。数据仓库可以长时间保存数据,也可以保存通常是永久性的信息。
  • Data Mart:
    • Data Mart is referred to as a pattern to get client data in a data warehouse environment. It’s a data warehouse-specific structure that’s employed by the team’s business domain. Every company has its own data mart, which is kept in the data warehouse repository. Dependent, independent, and hybrid data marts are the three types of data marts. Independent data marts collect data from external sources and data warehouses, whereas dependent data marts take data that has already been developed. Data marts can be thought of as logical subsets of a data warehouse.
  • 数据集市:
    • 数据集市被称为在数据仓库环境中获取客户端数据的模式。它是团队业务领域使用的特定于数据仓库的结构。每个公司都有自己的数据集市,保存在数据仓库存储库中。依赖、独立和混合数据集市是数据集市的三种类型。独立数据集市从外部来源和数据仓库收集数据,而从属数据集市则获取已经开发的数据。数据集市可以被认为是数据仓库的逻辑子集。

10. What are the different types of data marts in the context of data warehousing?

Following are the different types of data mart in data warehousing:

10. 在数据仓库的背景下,有哪些不同类型的数据集市?

以下是数据仓库中不同类型的数据集市:

  • Dependent Data Mart: A dependent data mart can be developed using data from operational, external, or both sources. It enables the data of the source company to be accessed from a single data warehouse. All data is centralized, which can aid in the development of further data marts.
  • Independent Data Mart: There is no need for a central data warehouse with this data mart. This is typically established for smaller groups that exist within a company. It has no connection to Enterprise Data Warehouse or any other data warehouse. Each piece of information is self-contained and can be used independently. The analysis can also be carried out independently. It’s critical to maintain a consistent and centralized data repository that numerous users can access.
  • Hybrid Data Mart: A hybrid data mart is utilized when a data warehouse contains inputs from multiple sources, as the name implies. When a user requires an ad hoc integration, this feature comes in handy. This solution can be utilized if an organization requires various database environments and quick implementation. It necessitates the least amount of data purification, and the data mart may accommodate huge storage structures. When smaller data-centric applications are employed, a data mart is most effective.
  • **依赖数据集市:**可以使用来自运营、外部或这两种来源的数据来开发依赖数据集市。它允许从单个数据仓库访问源公司的数据。所有数据都是集中的,这有助于进一步开发数据集市。
  • **独立数据集市:**此数据集市不需要中央数据仓库。这通常是为公司内存在的较小团体建立的。它与企业数据仓库或任何其他数据仓库没有任何联系。每条信息都是独立的,可以独立使用。分析也可以独立进行。维护一个可供众多用户访问的一致且集中的数据存储库至关重要。
  • **混合数据集市:**顾名思义,当数据仓库包含来自多个来源的输入时,使用混合数据集市。当用户需要临时集成时,此功能会派上用场。如果组织需要各种数据库环境和快速实施,则可以使用此解决方案。它需要最少的数据净化,并且数据集市可以容纳巨大的存储结构。当采用较小的以数据为中心的应用程序时,数据集市是最有效的。
  • 在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_45545090/article/details/125554383