"Data Mining" School Online [Chapter 1: Overview] Reference and analysis of answers to exercises

"Data Mining" series of articles table of contents

Chapter 1 Overview
Chapter 2 Data
Chapter 3 Data Preprocessing
Chapter 4 Data Warehouse and OLAP
Chapter 5 Regression Analysis
Chapter 6 Frequent Patterns
Chapter 7 Classification
Chapter 8 Clustering
Chapter 9 Outlier Detection



Chapter 1 Overview

Multiple choice questions

  1. The following data mining tasks are ( )
    A. Divide the company's customers according to gender
    B. Calculate the company's total sales
    C. Predict the outcome of a pair of dice
    D. Use historical records to predict the company's future stock price
  2. Which of the following four methods is not a common classification method ( )
    A Decision tree
    B Support vector
    C K-Means
    D Naive Bayes classification
  3. Which step is the task of integrating, transforming, dimensional reduction, and numerical reduction of original data ( )
    A. Frequent pattern mining
    B. Classification and prediction
    C. Data preprocessing
    D. Data stream mining
  4. KDD is ( )
    A data mining and knowledge discovery
    B domain knowledge discovery
    C document knowledge discovery
    D dynamic knowledge discovery
  5. The following analysis about outliers is wrong ( )
    A. Generally, outliers will be treated as noise and discarded.
    B. Outliers are noise data
    . C. In some special applications, outliers have special meaning.
    D. Credit card The phenomenon of suddenly spending a large amount of money in an area where consumption is not common falls into the category of outlier analysis.
  6. What can combine data in different dimensions to form a data cube is ( )
    A database
    B data source
    C data warehouse
    D database system
  7. The purpose is to narrow the value range of the data to make it more suitable for the needs of data mining algorithms, and to obtain the same analysis results as the original data is () A data cleaning
    B
    data integration
    C data transformation
    D data reduction
  8. Among the following tasks, the application of data mining technology in business intelligence is ( )
    A. Fraud detection
    B. Spam identification
    C. Finding specific Web pages based on Internet search engines
    D. Targeted marketing
  9. Applications of anomaly detection include ( )
    A. Cyber ​​attacks.
    B. Predicting the future price of a certain stock
    . C. Calculating the company’s total sales.
    D. Dividing the company’s customers according to gender.
  10. Which of the following statements about pattern recognition is incorrect ( )
    A. The essence of pattern recognition is to abstract patterns in different things and classify things accordingly.
    B. Medical diagnosis is one of the research contents of pattern recognition.
    C. Fingerprint unlocking technology for mobile phones Applications that are not pattern recognition
    D Natural language understanding also includes pattern recognition problems"
  11. The challenging issues currently faced by data analysis and data mining do not include ( )
    A. Diversification of data types
    B. High-dimensional data
    C. Outlier data
    D. Visualization of analysis and mining results

True or False

  1. Unsupervised learning can learn on unlabeled data sets.
  2. Clustering is to divide some objects into multiple groups or clusters, so that objects in the same group are relatively similar and objects in different groups are very different.
  3. Each record in a transactional database represents a transaction.
  4. Data warehouse and database are actually the same, both are storage systems for data or information.
  5. Outliers do not require consideration and study because they deviate from the general level.
  6. The main task of data mining is to discover potential rules from data, so as to better complete tasks such as describing data and predicting data.
  7. Data warehouses generally store online transaction data, and databases generally store historical data.
  8. A database is a subject-oriented, integrated, relatively stable data collection that reflects historical changes and is used to support management decisions.
  9. Common machine learning methods include supervised learning, unsupervised learning, and semi-supervised learning.
  10. Frequent patterns refer to patterns that appear frequently in the data set.
  11. Outliers refer to observation objects that deviate from the general level globally or locally.
  12. Regression is to predict discrete labels by building a model, while classification is to infer a certain numerical attribute of new data by building a continuous value model.
  13. Databases are subject-oriented and data warehouses are transaction-oriented.
  14. Differentiation is the comparison of the general characteristics of a target class of data objects with the general characteristics of one or more comparison class objects.
  15. The input objects to the clustering process have target information associated with them.
  16. The goal of data mining is not to collect data, but to discover patterns in existing data.
  17. Data analysis refers to the process of using appropriate statistical analysis methods to analyze, summarize and summarize the collected data, appropriately describe the data, and extract useful information.
  18. Definition of data analysis: Data analysis is the analysis of data. Professionally speaking, data analysis refers to using appropriate statistical analysis methods and tools to process and analyze the collected data according to the purpose of analysis, extract valuable information, and make full use of the data.
  19. The process or method of extracting or mining interesting knowledge or patterns from large-scale data is called data mining.
  20. Data mining mainly focuses on solving four types of problems: classification, clustering, association and prediction.
  21. Data analysis refers to the analysis, summary and summary of collected data using appropriate statistical analysis methods.
  22. The main application of data warehouse systems is online analytical processing.

parse

Judgment 12
Regression is to predict discrete labels by building a model, while classification is to infer a certain numerical attribute of new data by building a continuous value model.

Classification is to predict discrete labels by building a model, and regression is to infer a certain numerical attribute of new data by building a continuous value model.


Thinking summary

About data analysis and data mining - understanding

  • Tell us about your brief understanding of data analysis and data mining?
  • List other applications of data analysis and data mining in real life and scientific research work.

In my opinion, there are two biggest differences between data mining and data analysis. The first is that data mining processes larger data than data analysis. The second is that data mining does not have clear purposes and needs before processing the data. And data analysis exists. Taken together, the essence of data analysis and data mining is the same. They both discover valuable information from data to help humans make better decisions. Both are important tools in the current big data era, and both are needed. Pay attention to.

In real life, data analysis brings us many rules to better avoid risks. In scientific research, data analysis and data mining are the only means for us to harvest the fruits of scientific research. As shown in the figure, my research direction, blockchain, formally applies many data analysis and data mining methods to conduct information statistics on some current trends in blockchain.

Insert image description here

About Data Analysis and Data Mining - Technology

  • Combined with your own scientific research experience, talk about your understanding of commonly used technologies in data mining and data analysis?
  • What are the challenging problems in data mining? What do you think of this?

My scientific research direction is blockchain. The combination of blockchain, data mining and data analysis has important social and economic value, and is also an important field of blockchain scientific research. Taking the BlockSci blockchain data analysis framework as an example, the figure shows the use of the [] operator of the BlockSci blockchain object to extract the handling rates of each transaction in the Bitcoin 465100# block and perform related analysis. You may use to techniques such as classification, valuation, prediction, description and visualization. Through the analysis of these technologies, it can be known that the handling rate of most transactions in this block is set within 500SAT/BYTE.
Insert image description here

I think the current challenging issue in data mining is privacy protection. When solving practical problems, private data will inevitably be involved. For example, when studying the relationship between credit cards and users, the data will inevitably include users’ personal information; when studying cervical cancer (risk factors) and a person’s age and number of pregnancies When you are in a relationship, the number of partners, etc., there will be some private information that is inconvenient to be disclosed to the outside world. In the process of data mining, it will become an important aspect for people to study data mining without revealing the user's personal privacy issues and desensitizing the data.

Note: The answers are for reference only, personal thoughts summarized in 01.

Guess you like

Origin blog.csdn.net/Eechoecho/article/details/123184963