Machine Learning

Generality

Basic categories: supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), semi-supervised learning (section labeled data and unlabeled data), enhanced (enhanced) learning, deep learning (based on neural network classification and regression), transfer learning (in the absence of data to migrate to other areas of knowledge to another area), structured learning (output is the target , in addition to issues other than regression and classification, such as information retrieval, object matching)
Sickit-learn common function classification

| | Application | algorithm |

| --------------------------- | -------------------- | --------------- |

| Classification (Classification) | anomaly detection, image recognition | KNN, SVM |

| Clustering (Clustering) | segmentation, other populations could be divided | K-means, Spectral Clustering |

| Regression (Regression) | price forecasting, trend prediction | linear regression, SVR |

| Dimensionality reduction (Dimension Reduction) | visualization | PCA, NMF |
Books and courses: Machine Learning Zhou Zhihua, PRML-Bishop online courses: Andrew Ng, Stanford cs231n, Reinforcement learning David silver
Sorting Task (Classification): output of the model is a vector (Vector)

Package regression function modules: sklearn.linear_model (mainly linear function) and sklearn.preprocessing (mainly non-linear function)

Classification algorithm applications:

Finance: Loans to assess whether to approve the

Medical Diagnosis: Analyzing a malignant or benign tumor is

Fraud Detection

Web Analytics: Category pages of judgment

| Categories model | load module | use |

| ---------- | ---------------------------- | --------------------------------------------------------- |

| Nearest neighbor algorithm | neighbors.NearestNeighbord | fit () using the training data, predict function |

| SVM | svm.SVC | |

| Naive Bayes | naive_bayes.GaussianNB | |

| Tree | tree.DecisionTreeClassifier | cross_val_score 10 using cross validation, use fit and predict verify |

| Integrated approach | ensemble.BaggingClassifier | |

| Neural Network | neural_network.MLPClassifier | |
Return task (Regression): mainly linear_model module, the output of the model is a numerical value (scalar)

| Regression | load modules |

| ---------- | ------------------------------------------------------------ |

| Ridge regression | linear_model.Ridge |

| Lasso Kai归 | linear_model.Lasso |

| Resilient network | linear_model.ElasticNet |

| Minimum angle regression | linear_model.Lars |

| Bayesian regression | linear_model.BayesianRidege (naive Bayes Naive Bayes classifier Gaussian, polynomial model, multivariate Bernoulli Naive Bayes) |

| Logistic regression | linear_model.LogisticRegression |

| Polynomial regression | prprocessing.PolynomialFeatures |
Clustering task: The main cluster module-based, Euclidean distance, Manhattan distance, Mahalanobis distance, cosine similarity

| Clustering | load modules |

| -------- | ------------------------------- |

| K-means | cluster.KMeans |

| AP Clustering | cluster.AffinityPropagation |

| Mean shift | cluster.MeanShift |

| Hierarchical clustering | cluster.AgglomerativeClustering |

| DBSCAN | cluster.DBSCAN |

| BIRCH | cluster.Birth |

| Spectral clustering | cluster.SpectralClustering |

sklearn.cluster

| Algorithm name | parameters | scalability | similarity measure |

| ---------------- | ------------------------ | ------------------------------ | ---------------- |

| K-means | cluster number | large data | point pitch |

| DBSCAN | neighborhood size | mass data | pitch |

| Gaussian Mixture | cluster number and other parameters | high complexity and is not suitable for large-scale data processing | Mahalanobis distance |

| Birch | branching factor, threshold and other hyperparametric | scale data | Euclidean distance between two points |
Dimensionality reduction Task: The main module based decomposition, reduced data visualization and data

| Dimensionality reduction task | load modules |

| ------------ | --------------------------------------- |

| Principal Component Analysis | decomposition.PCA |

| Truncated SVD and LSA | decomposition.TruncatedSVD |

| Dictionary learning | decomposition.SparseCoder |

| Factor Analysis | decomposition.FactorAnalsis |

| ICA | decomposition.FastICA |

| NMF | decomposition.NMF |

| LDA | decomposition.LatentDirichletAllocation |

| Algorithm name | parameters | scalability | applicable tasks |

| -------- | -------------------- | ------------ | ------------------ |

| PCA | the drop dimensions and other hyperparametric | large-scale data | signal processing |

| FastICA | the drop dimensions and other hyperparametric | ultra-large-scale data | graphic image feature extraction |

| NMF | the drop dimensions and other hyperparametric | scale data | graphic image feature extraction |

| LDA | the drop dimensions and other hyperparametric | large-scale data | text data mining topic |
Training and testing data

. A training set (training set): case constitute a set of supervised learning experience;

. B test set (test set): Case effect assessment procedures set;

. C validation set (validation set): Super adjust the parameters of variables case set; super model parameter variables control how to learn;

. D supervised learning observations divided into three parts: the training set (50%) and the test set (50%), validation set (25%);

e overfitting (over-fitting) vs underfitting: means overfitting a better fit can be obtained on the assumption that the training set, in the case set outside the training set can not fit the data well ; due to the presence of noise or too little training data; regularization (regularization) can reduce the extent of over-fitting;

. F "into the garbage, garbage out is": supervised learning need to use a representative, label the correct data set for training; and more without good data, but good training effect is not necessarily better than less data.

. G cross-validation: the algorithm multiple times with the same training and testing data; for the training set is not enough time; data training set into N blocks, with a training algorithm block N-1, then a final test
Impact assessment - deviation and variance

. A supervised learning, the two basic indicators to assess the prediction error: bias (bias) and variance (variance); high variance is over-fitting the training data, deviation is high enough to fit the performance;

deviation b - variance equilibrium: two have now back in the inverse characteristic, a reduced index, another indicator will increase;

. C unsupervised learning: no prediction error, some of the properties evaluation data structure; evaluation method for the specific task;

. D unsupervised assessment, for example - to predict cancer (true positive TP (true positive) + true negatives TN (true negative) + false positive FP (false positive) + false negatives FN (false negative)):

Accuracy evaluation accuracy = (TP + TN) / (TP + TN + FP + FN); i.e., true positive and true negative predictive in previous classification

Malignant accuracy precision = TP / (TP + FP)

Recall recall = TP / (TP + FN)

Recall more than meet the actual requirements of other indicators;
Ridge Regression: solve simple linear regression sparse matrix distortion resulting X ^ t X values

optimize the target:

$$

argmin || X ^{t-y ||}2 + || a || a ^ 2

$$

The corresponding matrix solving method:

$$

w = (X ^{TX + ai)} -1x ^ TY

$$
Reinforcement Learning is a program or agent (agent) by the target continuously interact with the learning environment, learning is to maximize the cumulative return; reinforcement learning is a trial and error learning, because at various states (environment) need to try try all actions that can be selected to determine the merits of the action by the feedback given environment and, ultimately, the mapping relationship between environment and optimal operation (ie strategy)
Markov Decision Process (MDP)

MDP basic elements

$$

\ S \ epsilon S: Finite set of state status, s represents a particular state

a \ epsilon A :: limited action action collection, a representation of a particular action

T (S, a, S ") ~ Pr (s '| s, a): a state transition model, according to the current state s and action a predicted next state s, where P represents the action a proceeds to s from s' The probability.

$$
Tools machine learning large amounts of data: LibLinear and spark Mylib

oneWalker

Published 14 original articles · won praise 13 · views 8703

Private letter concerns

Machine learning python brief

Machine Learning

Generality

Guess you like