Data warehouse and data mining stage exam review questions

Chapter One Overview of Data Warehouse and Data Mining

No exercises

Chapter 2 Overview of Data Warehouse

One. True or False

  1. After the generation of analytical processing, the data processing environment has developed from a data environment centered on a single database to a systematic environment based on a data warehouse.
  2. Under transactional (operational) data processing, the data processing environment is mainly a data environment centered on a single database.
  3. Data warehouse is a data storage and organization technology that appears to build an analytical data processing environment.
  4. Application-oriented is the key feature that distinguishes data warehouses from traditional operational databases.
  5. A data warehouse is constructed by integrating multiple heterogeneous data sources.
  6. Since only the initial loading and query operations of the data are performed in the data warehouse, once the data enters the data warehouse, it is stable and basically will not be updated.
  7. The data cube must be 3-dimensional.
  8. In the data warehouse, the concept hierarchy defines a mapping sequence that can map low-level concepts to more general high-level concepts.
  9. The lattice of a cube is a mapping sequence defined on a single dimension, which can map low-level concepts to more general high-level concepts.
  10. The snowflake model reduces the redundancy that may exist in the star model by adding additional dimension tables on the basis of each dimension table.
  11. In the factual constellation model, there can be only one fact table.
  12. In the design process of the data warehouse, we must adhere to the principle of "data-driven and demand-driven, and demand-driven as the center".

two. Multiple choice

  1. Among the following people, the one known as the "father of data warehouses" is: () (Knowledge points: basic concepts of data warehouses; easy)
    AHInmon BEFCodd C. Simon D. Pawlak

  2. The following statement about data warehouse is correct: () (Knowledge point: basic concepts of data warehouse; difficult)
    A. The data in the data warehouse can only come from the operational database within the organization
    B. The data warehouse is to deal with transactional data processing needs generated by
    C. the data warehouse is subject-oriented, which is different from the operational database of its key features
    D. oriented enterprise data warehouse must be global, not sectoral data warehouse units

  3. Which of the following is not an element in the "Information Package Diagram"? () (Knowledge point: three-level model of data warehouse; difficult)
    A. Dimension B. Concept level of dimension and quantity at corresponding level
    C. Measurement D. Cube lattice

  4. Which of the following does not belong to the logical model of the data warehouse? () (Knowledge point: three-level model of data warehouse; middle)
    A. Star model B. Snowflake model
    C. Measurement model D. Fact constellation model

  5. In the design process of the data warehouse, the following description is correct () (knowledge point: data warehouse design; difficult)
    A. The data warehouse is a "data-driven + demand-driven" dual drive, but it must be demand-driven as the center
    B. The data warehouse is mainly oriented to the analytical processing environment, and it is difficult to fully clarify the needs of users when designing.
    C. The data warehouse is the same as the database, and its data mainly comes from the business process of the enterprise.
    D. The design goal of the data warehouse is to improve the performance of transaction processing.

  6. Which of the following statements about the star model is incorrect: () (Knowledge point: the three-level model of the data warehouse; difficult)
    A. There is a fact table, and the attributes in the fact table are determined by foreign keys to each dimension table And some corresponding measurement data to form
    B. There is a group of small subsidiary tables, called dimension tables, and one dimension table for each dimension
    C. Each field of the fact table is a fact measurement field
    D. Because each dimension can only create one dimension Table, so that some information in the dimension table will be redundant

  7. In the conceptual model of the data warehouse, the data is mapped from the objective world to the subjective perception through (). (Knowledge points: three-level model of data warehouse; easy)
    A. ER model B. Information package diagram
    C. Star model D. Snowflake model

four. Fill in the blank

  1. The key characteristics of data warehouse are: (), (), stable and () oriented. (Knowledge points: basic concepts of data warehouse; easy)
  2. The three-level model of the data warehouse includes: conceptual model, () model and physical model. (Knowledge point: three-level model of data warehouse; easy)
  3. When designing a data warehouse, common logical models include: () model, snowflake model and () model; among them, the () model is used for multiple data warehouses. (Knowledge point: three-level model of data warehouse; medium)
  4. With the deepening of computer applications, the types of data processing it performs are divided into transaction data processing and (). Among them, the data warehouse is a new data storage mechanism to meet the needs of (). (Knowledge points: basic concepts of data warehouse; middle)
  5. The design method of data warehouse and database is different. The design of database follows SDLC method, while the design of data warehouse follows () method. (Knowledge point: design of data warehouse; middle)

Chapter Three Online Analytical Processing (OLAP)

One. True or False

  1. OLTP is a multi-dimensional data analysis technology.
  2. OLTP is the main application of relational databases.
  3. Compared with the confirmatory analysis process of OLAP technology, data mining technology shows higher automatic learning ability.
  4. The drill-up operation of OLAP is to observe from summary data to detailed data on a certain dimension.

two. Multiple choice

  1. The core of OLAP technology is: () (Knowledge points: basic concepts of OLAP; middle)
    A. Fast response to users in linear B.
    Interoperability D. Multidimensional analysis

  2. Regarding the description of the difference between OLAP and OLTP, what is incorrect is: () (Knowledge point: the basic concept of OLAP; difficult)
    A. OLAP is mainly for the senior management of the enterprise to assist decision-making; while OLTP is mainly for the basic management of the enterprise Personnel to assist daily business
    . B. Unlike OLAP technology, OLTP needs to handle a large number of relatively simple tasks
    . C. OLAP is characterized by a large amount of transactions processed at one time, but the transaction content is relatively simple and the repetition rate is high. D. OLAP is Data warehouse is based, but its final data source is the same as OLTP, mostly from the underlying database system

Three, fill in the blanks

  1. Common OLAP analysis methods include: (), dicing, drilling and (). (Knowledge points: the basic concepts of OLAP; medium)
  2. There are several ways to organize OLAP data: ROLAP, () and (). (Knowledge points: the basic concepts of OLAP; medium)

Four, multiple choice questions

  1. Which of the following are common operations of OLAP? () (Knowledge points: the basic concepts of OLAP; easy)
    A. Slice B Dice C Drill D Rotate

Chapter 4 Basic Concepts of Data Mining

One. Multiple choice

  1. A supermarket researched sales record data and found that people who buy bread will also buy milk. What kind of data mining is this kind of problem? () (Knowledge points: basic concepts of data mining; middle)
    A. Association rule discovery B. Clustering
    C. Classification D. Outlier detection
  2. For data sets without class label attributes, which technique can be used to separate similar data from other types of data: () (Knowledge points: basic concepts of data mining; difficult)
    A. Association rule discovery B. Clustering
    C. Classification D. Outlier detection
  3. Assuming that the current data mining task is to identify the typical characteristics of spam, the commonly used data mining functions are: () (Knowledge points: basic concepts of data mining; middle)
    A. Association analysis B. Classification prediction
    C. Concept description D . Cluster analysis
    two. True or False
  4. In cluster analysis, the greater the similarity within a class (cluster) and the greater the difference between the classes (clusters), the better the clustering effect.
  5. The case of "Beer and Diaper" is a typical case of cluster analysis.

Chapter 5 Data Preprocessing

One. Multiple choice

  1. For the interval [240,460], according to the 3-4-5 rule of natural division, it can be divided into: () (Knowledge point: data preprocessing; medium)
    A. [200,300), [300,400), [400,500]
    B. [300,350 ), [350,400), [400,450), [450,500]
    C. [200,250), [250,300), [300,350 ), [350,400]
    D. [200,300), [300,400]

  2. Given a set of price data: 15,21,24,21,25,4,8,34,28, according to the method of equal width (width 10) to smooth it, can be divided into how many boxes? () (Knowledge points: data preprocessing; easy)
    A. 3 B. 4 C. 5 D. 6

  3. Assuming that the average and standard deviation of the attribute income are $54000 and $16000, respectively, using z-score normalization, the attribute value $73600 will be transformed into: () (Knowledge points: data preprocessing; middle)
    A. 0.736 B. 0.716 C. 1.225 D. 1

  4. In the following description of data reduction, what is wrong is: () (knowledge point: data preprocessing; difficult)
    A. Data reduction technology can be used to get the reduction representation of the data set, which is much smaller, but still close Maintain the integrity of the original data.
    B. Mining the reduced data set can improve the efficiency of mining and produce the same (or almost the same) result
    . C. The time spent on data reduction can exceed or "offset" the reduction The time saved by mining on the
    reduced data set D. Dimension reduction can detect and delete irrelevant, weakly related or redundant attribute dimensions.

  5. In which step is the integration, transformation, dimensionality reduction, and numerical reduction of the original data performed? () (Knowledge points: data preprocessing; middle)
    A. Frequent pattern mining B. Classification and prediction
    C. Data preprocessing D. Data stream mining

two. Multiple choice

  1. In real-world data, it is common for tuples to lack values ​​in certain attributes. Common methods to deal with this problem include: () (knowledge point: data preprocessing; middle)
    A. Ignore tuples B. Use one Global variables fill in the missing value
    C. Use the average value of the attribute to fill the missing value D. Use the most probable value to fill in
    E. Use the average of all samples in the same class as the given tuple
  2. Which of the following methods are data normalization methods? () (Knowledge points: data preprocessing; difficult)
    A. Maximum and minimum standardization B. Decimal scaling standardization
    C. 3-4-5 rule D. Z-score standardization
  3. In the dimension reduction method, the common heuristic methods used for attribute subset selection are: () (Knowledge point: data preprocessing; difficult)
    A. Step forward selection B. Step backward deletion
    C. Forward selection Combined with backward deletion D. Decision tree induction

three. Fill in the blank

  1. The three supporting technologies of business intelligence are: (), () and data mining. (Knowledge points: basic concepts of business intelligence; easy)
  2. Common data normalization methods are: (), zero mean normalization, and (). (Knowledge points: data preprocessing; medium)

Chapter 6 Concept Description: Characterization and Comparison

One. Multiple choice

  1. The following operations are not data generalization operations: () (Knowledge point: conceptual description; middle)
    A. Gather n-dimensional data cubes into n-1 dimensional data cubes
    B. Use OLAP to roll up data
    C. Observe The number of different values ​​of each attribute in the task-related data, and generalize the data.
    D. Use the maximum and minimum normalization method to scale the data to a small specific interval

  2. What is AOI: () (Knowledge point: conceptual description; easy)
    A. Attribute-oriented induction B. Attribute correlation analysis
    C. Knowledge discovery in database D. Attribute subset selection

  3. In the following description of attribute-oriented induction, which is correct is: () (knowledge point: conceptual description; difficult)
    A. The attribute generalization threshold is a parameter used to control the number of attributes in the data set
    B. In the process of attribute-oriented induction, It is absolutely impossible to generate the same rows
    C. The generalization relation threshold is a parameter used to control the number of generalized tuples
    D. Attribute-oriented induction is a method of selecting attributes based on the correlation between attributes and decision tasks

  4. What is DW: () (Knowledge points: basic concepts of data warehouse; easy)
    A. Domain knowledge discovery B. Machine learning
    C. Data mining D. Data warehouse

  5. After performing an attribute-oriented induction operation on the sales data of a shopping mall in 2016, the following data table is obtained. Suppose the target set is "refrigerator", then the following quantitative description rules can be obtained on the basis of the data table:
    ∀X,item(X)= "refrigerator" ⇒ \Rightarrow (location(X)=“Northeast”)[t1:( )]∨(location(X)=“华北”)[t2:( )]
    where t1 and t2 are the t weights of quantitative description rules, then, The values ​​of t1 and t2 are: (). (Knowledge points: conceptual description; difficult)
    Insert picture description here
    A. 0.43 0.57; B. 0.5 0.5;
    C. 0.33 0.67; D. 0.4 0.6;

two. Multiple choice

  1. Common indicators for measuring central tendency of data include: () (Knowledge point: conceptual description; medium)
    A. Mean B. Median
    C. Mode D. Interquartile
    E. Variance

three. Fill in the blank

  1. After performing an attribute-oriented induction operation on the sales data of a shopping mall in 2002, the following data table is obtained.
    Insert picture description here
    Suppose the target set is "TV", then the following quantitative description rules can be obtained based on the data table:
    ∀X,item(X)= "TV" ⇒ \Rightarrow (location(X)=“Asia”)[t1:( )]∨(location(X)=“Europe”)[t2:( )]
    where t1 and t2 are the t weights of quantitative description rules, then: t1=( ), t2=( ).
    (Knowledge points: concept description; medium)

Answers to exercises

[Chapter 2]
1. (Yes; knowledge points: basic concepts of data warehouse; easy)
2. (right; knowledge points: basic concepts of data warehouse; easy)
3. (right; knowledge points: basic concepts of data warehouse ; Easy)
4. (wrong; knowledge points: basic concepts of data warehouse; medium)
5. (right; knowledge points: basic concepts of data warehouse; medium)
6. (right; knowledge points: basic concepts of data warehouse; difficult )
7. (wrong; knowledge point: data cube; easy)
8. (right; knowledge point: data cube; medium)
9. (wrong; knowledge point: data cube; difficult)
10. (right; knowledge point: data warehouse The three-level model; medium)
11. (wrong; knowledge point: the three-level model of data warehouse; difficult)
12. (wrong; knowledge point: basic concepts of data warehouse; medium)
ACDCBCB
1. Theme, integrated, reflecting history Change (knowledge point: basic concept of data warehouse; easy)
2. Logical model (knowledge point: three-level model of data warehouse; easy)
3. Star, fact constellation, fact constellation (knowledge point: three-level model of data warehouse ; Medium)
4. Analytical data processing, analytical data processing (knowledge points: basic concepts of data warehouse; medium)
5. CLDS (knowledge points: design of data warehouse; medium)

[Chapter Three]
1. (Wrong; knowledge points: basic concepts of OLAP; easy)
2. (right; knowledge points: basic concepts of OLAP; easy)
3. (right; knowledge points: basic concepts of OLAP; difficult)
4. (False; knowledge points: basic concepts of OLAP; medium)
DC
1. Common OLAP analysis methods include: (slicing), dicing, drilling and (rotating). (Knowledge points: the basic concepts of OLAP; medium)
2. OLAP data organization methods are as follows: ROLAP, (MOLAP) and (HOLAP). (Knowledge points: the basic concepts of OLAP; medium)
ABCD

[Chapter 4]
ABB
1. (Yes; knowledge points: basic concepts of data mining; medium)
2. (false; knowledge points: basic concepts of data mining; easy)

[Chapter 5]
AACCC
ABCDE ABD ABCD
1. The three supporting technologies of business intelligence are: (data warehouse), (OLAP) and data mining. (Knowledge points: basic concepts of business intelligence; easy)
2. Common data normalization methods are: (minimum maximum normalization), zero mean normalization, and (decimal scaling normalization). (Knowledge points: data preprocessing; medium)

[Chapter 6]
DACDC ABC
0.4 0.6

Guess you like

Origin blog.csdn.net/ljw_study_in_CSDN/article/details/109146185