Machine Learning - a brief summary

Now go back and look at the course content was found to remove a large number of formulas are derived, basically what things

Outline

Category: supervised, unsupervised, semi-supervised and reinforcement learning

Supervised learning: data input and mark. Regression, classification, sequence labeling problem.

Generative model: the probability forecast

Discriminant model: direct learning decision-making function

Maximum likelihood estimation (MLE): the probability of various samples of the training set directly take up maximize

The maximum a posteriori (MAP): the MLE excellent foundation a priori probability

Unsupervised Learning Representative: clustering

Decision Tree

Input variables to the true value has a truth table, it becomes a form of a tree, path root to leaf represents a row of the truth table

Optimization goal: reduce the size of the tree, increase the degree of generalization

Optimal category: select the optimal properties based on entropy (information gain).

Pruning: pre-pruning (better not to draw up plans), after pruning (better to replace the leaf nodes)

Continuous processing value: binary

Handling missing values: when the category, for the promotion of formula

Linear Regression

A given data set, find a model can predict the results

Linear Regression: \ (F (x_i) = W ^ Tx_I + B \) , for the function for the minimum mean square error

Regularization: optimizing the structure, i.e., the absolute value of the weighting coefficients \ (\ the lambda \)

Probability

Chebyshev's inequality: Suppose the random variable X with the desired \ (E (X) = \ MU \) , the variance \ (Var (X) = \ Sigma ^ 2 \) , then for any integer \ (\ Epsilon \) , there \ (P (| X- \ mu | \ ge \ epsilon) \ le \ frac {\ sigma ^ 2} {\ epsilon ^ 2} \)

Law of large numbers: n independent and identically distributed random variables, their mean converges in probability \ (\ mu \)

Central Limit Theorem: a large number of independent and identically distributed variables and converges in distribution of the normal distribution.

MLE and MAP:

MLE think the constant parameter is unknown, need to use the data to estimate

MAP parameters are considered random variables, probability distribution has its own

MLE readily small data overfitting; the MAP different results for different a priori.

Bayesian decision theory

Bayesian decision theory: how to optimize the category tags based on the probability of loss and miscarriage of justice, even the smallest risk function.

Decision face: binary classification, the surface is classified into two categories probability sample of the same value constituted.

Bayes error: the probability of classification error, P (mistake) = P (X in L1, Y = 0) + P (X in L0, Y = 1)

Classification methods Bayesian classifier:

  1. Conditional probability density is determined, prior probability inference, Bayesian posterior probability Theorem method (model formula)
  2. Directly address the posterior probability problems, using decision theory classification (discriminant model)
  3. Find a function that directly maps the input to the label. Nothing to do with probability.

KNN (K adjacent) classifiers

According to the most recent k samples originally voted to label.

K value selection, distance measure, decision rules

Naive Bayes

Generative model

Each independent variables that condition, may be divided between the variables, then the Bayesian formula
\ [Y_ {new} = \ arg \ max_ {y_k} P (Y = Y_k) \ prod_ {i = 1} ^ nP ( X_ {new} | Y = Y_k ) \]

Logistic regression

Discriminant model. Direct Learning \ (P (the Y | X-) \)
\ [P (. 1 the Y = | X-) = \ {FRAC. 1} {1+ \ exp (W 0 + W ^ the TX)} \]
can be extended to multiple classifiers. So the purpose is to learn w

Calculating cross-entropy \ (l (w) = \ sum_lY ^ l \ ln P (Y ^ l = 1 | X ^ l, W) + (1-Y ^ l) ln P (Y ^ l = 0 | X ^ l , W) \)

Seeking great value.

Support vector machine (SVM)

Find a straight line, the sample was divided in half, and the maximum interval

I.e., for all classes of points 1, satisfies \ (the Tx + W ^ B \ GE C \) , based point satisfies -1 \ (w ^ Tx + b \ le-C \)

Maximize the spacing, i.e. \ (2C / || W || \) . The final conclusion is
\ [\ max_ {w, b
} \ frac 1 {|| w || _2} \\ st \ y_i (w ^ Tx_i + b) \ ge 1 \] convex quadratic optimization problem using Lagrange day multiplier method.

Interval described above is to maximize the hard, soft actually maximize interval, i.e., a slack variable is added to each sample point, the cost of the slack variables. That
\ [\ min_ {w, b } \ frac 1 2 {|| w || _2} ^ 2 + C \ sum \ xi_i \\ st \ y_i (w ^ Tx_i + b) \ ge 1- \ xi_i \]

Clustering

k-means:

Clustering.

Initialization k clusters centers, from each sample to find its nearest cluster classification, and then adjust the coordinates of the center, constantly iteration.

Actually in the optimization \ (\ min _ {\ mu , c} \ sum_i \ sum_ {C (j) = i} || \ mu_i-x_j || ^ 2 \)

EM actually steps: first fixed \ (\ mu \) optimization \ (C \) , and then fixed \ (C \) optimization \ (\ mu \)

GMM (Gaussian mixture model):

k-means of C functions too hard, posterior probability we put into it, the probability of each class, that x belongs to, then as MLE, in short with the last iteration formula

EM steps of: calculating the first posterior probability, then the posterior probability of the iterative parameters

PCA principal component analysis

The main purpose is dimensionality reduction - the original sample space-related dimensions removed, leaving a better representation of the dimensions of the original data.

Specific steps:

  1. To the center
  2. Covariance matrix
  3. Covariance matrix eigenvalue decomposition to find the k largest eigenvalues ​​corresponding eigenvectors, standardized composition eigenvector matrix W
  4. \(z_i=W^Tx_i\)

Thinking probably find the greatest impact on the direction of k retained in the sample space unit offset, erase the other direction, that is projected in the k-dimensional super-plane.

Deleted features often associated with noise-related, so this is also a sense of noise reduction

Guess you like

Origin www.cnblogs.com/dqsssss/p/12424274.html
Recommended