Data Mining Overview

Data, information, knowledge, and wisdom are several levels of an information system. The purpose of mining is to acquire knowledge and even wisdom, that is, the ability to summarize and deduce information. The evolution process of screening is sql query -- search -- recommendation -- clustering and classification.

1. About the data

Statistical description, visualization, similarity and dissimilarity; reduction (simplification), wavelet transform, principal component analysis (PCA)

2. What can be done

http://www.cnblogs.com/tornadomeet/p/3395593.html

1. Correlation: correlation coefficient, regression analysis. FP Growth algorithm and Eclat algorithm

2. Classification:

Linear, log-linear, logistic regression

Tree shape (symbol): C4.5 CART (result is conditional probability)

Probability: Naive Bayes, Bayesian Networks, EM Algorithm

Neural network: (there is a network when perceiving, and there are rules when reasoning and working) BP->Deep Learning->DBN RBM CNN (suitable for pattern recognition)

SVM (Store Math Optimization): Linearly Separable, Kernel Techniques

Combination: bagging (voting), adaboost (considering previous effects), random forest (multiple CARTs)

 

3. Clustering:

Division method: K-means 

Density Clustering: DBSCAN OPTICS DENCLUE

Hierarchical Clustering: BIRCH Chameleon

Grid-based (draw the grid first): STING, CLIQUE, WaveCluster

Based on probability (probability distribution function is satisfied between data, and the distribution is fitted with data): COBWEB algorithm, GMM (Gaussian Mixture Model), neural network algorithm has SOM (Self Organized Maps)

4. Anomaly detection

 

3. Comparison of Classification Algorithms

http://www.cyzone.cn/a/20170422/310196.html

Regression: establish functional relationships, examples: traffic flow analysis, mail filtering

Decision tree: good at evaluating a series of different characteristics, qualities, characteristics, examples: credit evaluation, horse racing results

Random Forest: Large-scale datasets and terms with a large number of and sometimes uncorrelated features. Examples: churn analysis, risk assessment

Naive Bayes: Significant features on small datasets, examples: sentiment analysis, consumer classification

Hidden Markov Models: Predicting hidden states, examples: facial expression analysis, weather forecasting

 

Recurrent neural network: When there is a lot of ordered information, examples: image classification and captioning, political sentiment analysis.

Long short-term memory (LSTM) and gated recurrent unit nerual network: natural language processing, translation

Convolutional neural network: Convolution refers to the fusion of weights from subsequent layers that can be used to label the output layer. When there are very large datasets, large numbers of features, and complex classification tasks. Image recognition, text-to-speech, drug discovery

 

4. Understanding

Regression: Function Fitting

Trees: rules for if else

Probability: Joint Distribution

Network: Instead of finding the function mapping relationship or joint distribution law, the relationship similar to the function mapping is recorded and expressed through the node weights and biases of the neural network. The real analytical function of the function cannot be written directly. (so called ai black box), the algorithm is not complicated and depends on data (image thinking)

svm: strict mathematical function, complex algorithm, small data dependence 

5. Deep Learning

The most difficult part of other machine learning methods is dimensionality reduction, feature selection, and preprocessing such as labeling, but deep learning is characterized by the automatic extraction of low-level or high-level features required for classification (using big data to learn features, suitable for voice, images, translations, sentiment analysis, etc. data with less obvious features). The input is then further processed based on these features.

6. Application areas

Text Mining and Natural Language Processing

imagery, computer vision

voice

 

http://blog.csdn.net/lanchunhui/article/category/5842379/7

https://github.com/justdark/dml

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326318725&siteId=291194637