Data, information, knowledge, and wisdom are several levels of an information system. The purpose of mining is to acquire knowledge and even wisdom, that is, the ability to summarize and deduce information. The evolution process of screening is sql query -- search -- recommendation -- clustering and classification.
1. About the data
Statistical description, visualization, similarity and dissimilarity; reduction (simplification), wavelet transform, principal component analysis (PCA)
2. What can be done
http://www.cnblogs.com/tornadomeet/p/3395593.html
1. Correlation: correlation coefficient, regression analysis. FP Growth algorithm and Eclat algorithm
2. Classification:
Linear, log-linear, logistic regression
Tree shape (symbol): C4.5 CART (result is conditional probability)
Probability: Naive Bayes, Bayesian Networks, EM Algorithm
Neural network: (there is a network when perceiving, and there are rules when reasoning and working) BP->Deep Learning->DBN RBM CNN (suitable for pattern recognition)
SVM (Store Math Optimization): Linearly Separable, Kernel Techniques
Combination: bagging (voting), adaboost (considering previous effects), random forest (multiple CARTs)
3. Clustering:
Division method: K-means
Density Clustering: DBSCAN OPTICS DENCLUE
Hierarchical Clustering: BIRCH Chameleon
Grid-based (draw the grid first): STING, CLIQUE, WaveCluster
Based on probability (probability distribution function is satisfied between data, and the distribution is fitted with data): COBWEB algorithm, GMM (Gaussian Mixture Model), neural network algorithm has SOM (Self Organized Maps)
4. Anomaly detection
3. Comparison of Classification Algorithms
http://www.cyzone.cn/a/20170422/310196.html
Regression: establish functional relationships, examples: traffic flow analysis, mail filtering
Decision tree: good at evaluating a series of different characteristics, qualities, characteristics, examples: credit evaluation, horse racing results
Random Forest: Large-scale datasets and terms with a large number of and sometimes uncorrelated features. Examples: churn analysis, risk assessment
Naive Bayes: Significant features on small datasets, examples: sentiment analysis, consumer classification
Hidden Markov Models: Predicting hidden states, examples: facial expression analysis, weather forecasting
Recurrent neural network: When there is a lot of ordered information, examples: image classification and captioning, political sentiment analysis.
Long short-term memory (LSTM) and gated recurrent unit nerual network: natural language processing, translation
Convolutional neural network: Convolution refers to the fusion of weights from subsequent layers that can be used to label the output layer. When there are very large datasets, large numbers of features, and complex classification tasks. Image recognition, text-to-speech, drug discovery
4. Understanding
Regression: Function Fitting
Trees: rules for if else
Probability: Joint Distribution
Network: Instead of finding the function mapping relationship or joint distribution law, the relationship similar to the function mapping is recorded and expressed through the node weights and biases of the neural network. The real analytical function of the function cannot be written directly. (so called ai black box), the algorithm is not complicated and depends on data (image thinking)
svm: strict mathematical function, complex algorithm, small data dependence
5. Deep Learning
The most difficult part of other machine learning methods is dimensionality reduction, feature selection, and preprocessing such as labeling, but deep learning is characterized by the automatic extraction of low-level or high-level features required for classification (using big data to learn features, suitable for voice, images, translations, sentiment analysis, etc. data with less obvious features). The input is then further processed based on these features.
6. Application areas
Text Mining and Natural Language Processing
imagery, computer vision
voice
http://blog.csdn.net/lanchunhui/article/category/5842379/7
https://github.com/justdark/dml