Lin Xuantian's Notes on Machine Learning Techniques (5)

Suddenly there are other things, I feel that the progress is slow O~o

Kernel Logistic Regression


P18 5.1
combines Logistic Regression and Kernel.
Comparing hard and soft:
insert image description here
We can normalize ζ again. When a point has an error, 0<ζ < 1, and when there is no error, ζ = 0, then we can use a max function to summarize: sort it out, and find that it is the same as
insert image description here
before The regularization is very similar ( I don’t remember what regularization is here ), but why can’t it be replaced directly, because it is found that it has no conditions, so it is not a QP problem, so the dual and kernel can’t be used, and the max function is also difficult to get. :
insert image description here
Summarize the differences and connections between svm and regularization. It can be seen that margin is actually a kind of regularization
insert image description here
and the C of the two SVMs is also related to the λ of regularization.
insert image description here
Therefore we will consider linking SVM with other previous models.


P19 5.2
refers to err 0/1 , and can also draw the graph of err SVM . It can be seen that he completely covers err 0/1 , which is the upper limit of err 0/1 . According to the previous logistic regression, if an upper limit function is found, You can use this upper limit function directly to improve the original function indirectly. In addition, err SVM is still a convex upper limit, called: convex upper bound, err SVM is also called hinge error measure.
insert image description here
Compared with the previous err SCE , we found that the two are very similar, so we can replace
insert image description here
the comparison of the three. We found that we solved a logistic regularization problem , in fact, almost got the solution of SVM.
insert image description here
So, after solving the SVM, did you also get a LogReg solution?


How does P20 5.3
integrate LogReg and SVM? The following two methods are too biased towards one type and lose the advantage of the other side.
insert image description here
We can set the coefficient A and the constant B (used to translate the boundary) to fuse the two. Generally speaking, A>0, because W SVM usually does a good job, and there is no need to run counter to it, b ≈ 0 because b SVM is generally Not bad:
insert image description here
tidy up, so that we get a new LogReg. We replace the φ of the previous LogReg with φSVM at this time . At this time, only the variables A and B need to be adjusted. There are two steps in total:
insert image description here
this is the model proposed by Platt , the general steps are as follows:
insert image description here
The kernel SVM solves the approximate optimal solution of LogReg in the z space, and the next lecture is to find the exact best solution of LogReg in the z space!
insert image description here


P21 5.4

This section is a bit difficult to understand, because it combines a lot of things from before, some of which I forgot or didn't understand clearly, so I have to look back at the notes of the red stone cornerstone, but after reading it a few times, I feel familiar with it again, and try to summarize it , and then eat it with Da Niu’s notes hhh

Let's take a look at how SVM did it before. Because SVM is a quadratic programming with duality, QP is used after the duality, and then QP finds that the kernel can be used to reduce the complexity, changing from O(d ~ ) to O (d). But LogReg is not a quadratic programming (quadratic).
Think about how to use the kernel before, if w can become a linear combination of z, maybe wTz will have zTz (the inner product of z), and then you can use the kernel to calculate: refer to the following different algorithms, the expression form of w,
insert image description here
all It is linear z:
insert image description here
There is a representer theorem in mathematics:
insert image description here
the optimal w ∗ w_*wIt is a linear combination of z, so it is consistent.
Simple proof: Let w ∗ = w_* =w= w//+ w(// and ⊥ are for the zn plane), and w= 0.
If the optimalw ∗ w_*wIf w ! = 0: 1 N ∑ n = 1 N err ( yn , w T zn ) \frac{1}{N} \sum^N_{n=1}err(y_n,
w ^Tz_n)N1n=1Ne r r ( andn,wTzn) , because multiplying zn in parallel is still zn, and multiplying zn vertically is 0, so no matter what err is, the terms behind the formula are the same
wTw in front of the formula, w//* w=0. If w! = 0, the formula will always be greater than w//* w//, which contradicts the previous saying that W* is optimal. (Contradictory method)
insert image description here
In this way, our w can be expressed linearly, which means that LogReg can also use the kernel, and the err function is a sigmoid function. After some simplifications, Kernel Logistic Regression (the so-called KLR in this chapter) is obtained: a brief summary: KLR is expressed by
insert image description here
using Theorem (representer theorem) makes it possible to use the kernel to replace zTz, and then there is only one β variable, and this β is unlimited, and then use various GD/SGD to solve the optimal β.

Then KLR's conversion of w can be viewed from another perspective:
insert image description here
(If you want to know, watch the video...)
insert image description here
Ask how many dimensions this KLR linear model has.
Because it has been converted to study only β n β_nbnvariable, so it is N-dimensional.


Summary: (Chapter 5 is finally over!!!! The last section was so hard to read)
insert image description here
The next section will talk about using the kernel to do general regression~

Guess you like

Origin blog.csdn.net/Only_Wolfy/article/details/89602875