Machine Learning Cornerstone Lecture 11 Notes

Lecture 11: Linear Models for Classification

11-1 Linear models for binary classification

Linear classification, linear regression and logistic regression were compared in the previous section:


So when linear classification solves Ein, it is an NP-hard problem. Can the latter two help to solve it?

Calculate the error function for three cases, which are:


The picture of the error function can be drawn as:


Among them, scaled ce=log2(1+exp(-ys)), in order to make ys=0 cut to one point.

As long as there is regression, classification can be done:

Advantages of linear regression: the easiest; disadvantages: more different from 0/1.

Logistic regression advantages: relatively easy; disadvantages: loose upper limit.

PLA advantages: good results when linearly separable; disadvantages: use pocket when linearly inseparable.

Linear regression can be used to set w0 in other methods.

People prefer logistic regression to pocket.


11-2 Stochastic's Gradient Descent (Stochastic Gradient Descent SGD)

PLA: look at only one point per round; logistic regression: look at all points in each round.

So how can logistic regression be as fast as PLA?

Answer: Use stochastic gradient descent, which saves the process of finding the overall gradient.

Advantages: Simple and low cost; Disadvantages: It is impossible to predict whether the result of each step is accurate, so it is not stable.

SGD and PLA for logistic regression are roughly the same.

Deciding when to stop is difficult in SGD (usually choosing the number of iterations).

Generally choose yita=0.1 (experience number)


11-3 Multiclass problem of logistic regression (Multiclass)


Method: divide one class at a time into one class at a time, so that you can do right and wrong questions.


Improvement: The soft version of the original method is not dichotomous, but probability.

We call this method the one-versus-all method (OVA).

Advantages: High efficiency; Disadvantages: Unbalance is easy to occur when there are too many categories.


11-4 Multi-class and binary classification

One-versus-one method (OVO) , four types of points can generate six classifiers, and the six classifiers vote for each point to draw a conclusion.

Advantages: efficient, each classifier does not use all points; disadvantages: when there are many classifiers, the storage space is large and the prediction time is long.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325473617&siteId=291194637