A Data Mining

Dig out from large amounts of data (including text) implicit, unknown, potentially useful for decision-making relationship, models and trends, and model for decision support and knowledge of these laws, provide predictive decision support method, tools and processes that data mining; it uses a variety of analysis tools to find its own rules and discover the relationship between model and data in large amounts of data, a comprehensive statistical, database technology and artificial intelligence technology.

1. The basic tasks of data mining

The basic task of data mining include the use of classification and prediction, clustering, association rules, sequence mode, deviation detection, intelligent recommendation and other ways to help extract the business value of the data contains.

2. Data mining modeling process

  1. The definition of mining goals
  2. Data sampling
  3. Data exploration
  4. Data preprocessing
  5. Mining Modeling
  6. Model Evaluation

  1. The goal is simply defined mining (confirmation target, domain knowledge and understanding of the relevant background, clear user needs).
  2. Data sampling: clear a mining target, the need to extract a sample from the business systems associated with a subset of data mining target, (criteria: relevance, reliability, validity, integrity [not all data]), test data quality (measure: complete information indicators complete and accurate data no abnormal values). Sampling data extraction method (including but not limited to: random sampling, systematic sampling, stratified sampling, sampling from the starting sequence, the sample classification).
  3. Data Exploration: include abnormal value (discrete value, etc.) analysis, missing values ​​analysis, correlation analysis, periodicity analysis.
  4. Data Preprocessing: data filtering, data variable conversion, deletion processing outliers, bad data processing, data normalization, principal component analysis, attribute selection, data protocol, and the like dimensionality reduction.
  5. Mining modeling :( classification, clustering, association rules, sequence mode or intelligent recommendations, etc. in which one or more algorithms to model)
  6. Model Evaluation: Based on the results, choose the best model to interpret and apply.

3. Common data mining modeling tools

  • Python
  • KEEP

Guess you like

Origin www.cnblogs.com/persist0701/p/11409980.html