--- introduction of data mining

     Data Mining (Data Mining) , also known as Knowledge Discovery Knowledge (Knowledge-Discovery in Databases referred to as KDD).

      1. What is DM? (What?)

      Simply put, DM is to find valuable knowledge from massive data, this knowledge can be rules, constraints, patterns, law and so on. This knowledge can be used charts, decision trees, association table like an explanation.

      Speaking of DM, you should feel when it comes to the development of database technology. We know that a simple database technology to collect data from the 1960s to the DBMS, relational databases, etc., all the way over the development, it is also will lay the foundation for the emergence of DM due to the continuous development of database technology.

      2. Why DM? (Why?)

      Mainly because the data explosion problem. The current due to the rapid development of data collection and data storage technologies that each organization can obtain and accumulate vast amounts of data, such as Google, Facebook, etc., the amount of data they produce per day is very massive, but the use of traditional methods of data analysis from these massive data extract useful information but it is very challenging, and therefore derived from the concept of data mining. So we can be that data mining is a technique that combines a traditional method of data analysis and complex large-scale data processing algorithms combined.

      There is a saying that good: "We are drowning in data, but a thirst for knowledge."

     3. Where the use of DM? (Where?)      

     First, briefly explain some of the techniques of data mining:

     Found that 1) associated with the law: Aprior algorithm

     2) Cluster analysis: without a teacher with training data is not the category label

     3) model classification: giving top priority to the training data class labels, supervised learning.

     4) The abnormality detection: identifying further observation value significantly different from other data, or to find outlier outlier.

     5) data cube, and the like visualization.

      Brief Description of several applications:

      1) customer Relationship Management (CRM) --------- client association rules, such as those used in the recommended shopping

      2) web Analysis ------- problems inherent order web page, google web search

      3) classification of image recognition ----

      4) Bioinformatics- sequence pattern, proteins, gene sequence predicted classification and the like

      Generally different applications, they will be mining data types are different. It will lead to different types of data we use different data mining techniques to analyze the data. We now know that different types of data in the past, and now the data type varied:

     There are structured: in the database, such as data storage, the relationship between these data clear, easy to analyze

     Semi-structured: The structure xml data, the data also can be seen, that is not so obvious it

     Unstructured: such as text files, web page content, such as streaming video, there is no clear relationship between the data structure, it is difficult for analysis and processing.

     Development of data collection technologies has led to a large number of high-dimensional data that appears in the form of a complex sequence diagram, number, etc., so the large-scale complex of high-dimensional data analysis is a very important task.

 

     

     4. Data mining process:

      Knowledge discovery process: Data Cleansing - Data Warehousing - Select Data - Data Mining - knowledge (data preprocessing, data mining, analysis results)

     

 

      5. Data Mining Theory:

       Data mining is an interdisciplinary, including information retrieval, statistics, machine learning, data compression, information theory, and so on. .

 

Reproduced in: https: //www.cnblogs.com/GuoJiaSheng/p/3995555.html

Guess you like

Origin blog.csdn.net/weixin_33910385/article/details/93614762