Pacific Internet Recommended system development intern Study Notes

Machine learning, classification, regression sort of different algorithms

Categories (all supervised learning)

  1. Distance discriminant method, that nearest neighbor KNN

KNN means of watching neighboring three points classification, accounting class category is the most unknown point

  1. Bayesian classifier;

Mainly by the formula P (a | b) = P (b | a) * P (a) / P (b)

If the probability of large who, who is right

  1. Linear discriminant method, i.e. Logistic Regression algorithm;

Suppose a w1x1 + w2x2 line, and with the a priori data in, to give weight, and finally with the sigmoid function section 01 is placed in the

 

 

  1. Decision tree;

Make a decision tree in accordance with a priori data, in accordance with decision making

  1. Support vector machine; SVM

Find a line on a sample space, divided into a hyperplane that separate different classes. So to choose the least disturbed, the best line.

So linearly separable line hyperplane maximum interval line. wx + b is in principle arbitrary w = y, then so you can always find a w y = 1. Those points that support vector. The distance between these points is the interval. We want to make the greatest distance.

Convex quadratic programming, Lagrange multipliers can be used to solve its dual problem. After training, most samples do not need to retain the final model only with the support vector

  1. Neural Networks

return

Linear regression. Multivariate linear regression

y = w1x1 + w2x2 ,,,

Stochastic gradient descent processing weights. But handled properly will produce overfitting

Ridge regression

Such a return giving up some of the unbiased case, some pathological array disrupt a result, ridge regression can exchange numerical stability

If the high collinearity NA and ridge regression

Sequence

Single Document

  Single document processing target method is a single document, the document is converted to feature vectors, the machine learning system to learn from the training data according to a document classification or regression function scoring, scoring result is the search result. Here we use a simple example of this approach. 

Manual annotation of the training set, in this example, we used for each document the three characteristics: Cosme query document similarity scores, numerical Proximity query PageRank value and the page of the word, but the correlation is judged binary, i.e., either related or not related, of course, where the correlation is determined according to the relevance can extend polyhydric, in this case made simplified for convenience of explanation.

Documentation method

 

 Document list method is to use a list of ways to learn

Guess you like

Origin www.cnblogs.com/yzwdxmw/p/12363953.html