Lin Xuantian's Notes on Machine Learning Techniques (6~16)

emmm Now I want to try to only read the notes of the red stone, and then write if there is anything to add, to see if the effect is good, and the progress is fast (it must be fast)~ because the content is combined without anything.

6 Support Vector Regression


P22 6.1
insert image description here
hereb TK b b^TKbbT Kβ, this is written because: ifβ 3 TK β 5 β^T_3Kβ_5b3TKβ5If , then K (3,5) is multiplied by K , and the coefficient to be multiplied by two βs is obtained.

insert image description here
K plus a T will not change, because it is a symmetric matrix, and then add III because λ is just a coefficient, so we need to get an extra matrix.


P23 6.2


P24 6.3
insert image description here
? ? ? Why is it subtracted
? Answer: Here it is enough to push and push according to Chapter 4.
insert image description here
Then here, the correspondence/analogy between colors is enough, for example:
1 ( yn − w T zn − b ) ≤ ε + 1(y_n-w^Tz_n -b)\leqε+1(ynwTznb)ε + ξ^n:
Simplified:w T zn + b ≥ wTz_n+b≥in T zn+b yn y_nyn- ε - ξ ^ n
精比yn ( w T zn + b ) ≥ 1 − ξ n y_n(w^Tz_n+b) ≥ 1-ξ_nyn(wTzn+b)1Xn
1就是yn y_nynyn y_nyn- ε is 1, and the same applies to others


P25 6.4
Summary


7 Blending and Bagging

I tried a chapter without taking notes and remembered it in detail, I feel. . good? But the SVM is over, so I still need to review it.


Before P26 7.1
insert image description here
, it was validation. The aggregation to be discussed next is to use collective wisdom to solve problems in a bunch of (maybe not very good) hypotheses. The combination of a bunch of weak and weak ones may be very strong. .


8 Adaptive Boosting

Much faster progress (of course)


P30 8.1
P31 8.2
P32 8.3
insert image description hereinsert image description here
actually did so many operations to make g different (these g should not be very good), just to make g different, and these g are relatively simple, just to make mistakes more at a certain angle Small, just like an 8.1 primary school student knows Apple, g can perform well in some methods, but it is not very usable.

Section 8.x also said that with different g (different voices), the algorithm made (I don’t know what algorithm, oh, adaptive boosting), the effect is good. If they are all similar, there is no average meaning.


9 Decision Tree

P35 9.1
There are many Decision Tree models, and there is no theoretical guarantee, but in experience, it is not bad to use. Here is a Decision Tree that meets this course.


P36 9.2
insert image description here
said that there are two classes, one is μ, and the other is 1 - μ, and then substitute. = = Look at me several times, μ = N 1 N \frac{N1}{N}NN 1, should be given, because K = 2, directly 1 - μ is enough.


P36 9.3
P37 9.4


10 Random Forest


At the end of the P38 10.1
insert image description here
video, I don’t quite understand why it is possible to do this. What is pi ∈ basis?
It has been solved, in order to avoid the interference of unimportant features, to see which feature is more important, use
P39 10.2
P40 10.3
P41 10.4 with importance(i)


11 Gradient Boosted Decision Tree

P42 11.1
P43 11.2
insert image description here
How to find the derivative?
P44 11.3
P45 11.4
Summary, a bunch of things are really messy. After reading other people’s notes and then reading Teacher Lin’s summary, it feels smoother, but you still have to do it yourself to really understand it~ The next step is neural network and deep learning. I have never known the difference between ML, NN, and DL. I am excited! ٩(๑>◡<๑)۶


12 Nerual Network

P46 12.1
P47 12.2
P48 12.3
P49 12.4


13 Nerual Network

P50 13.1
P51 13.2
P52 13.3
P50 13.4
insert image description here
I am a little confused here, I don’t know why it is WTW^TWT , in fact, you can know it by writing on paper,WWW is dxd~, naturallyWTW^TWT is d~xd, and then x is dx 1 dimension. Although the light yellow there is writtenh ( x ) = WWT xh(x) = WW^Txh(x)=WWT x, actuallyh ( x ) = W ( WT x ) h(x) =W(W^Tx)h(x)=W(WT x), you need to calculate the ones in the brackets first, and you can know that the resulting matrix cannot be multiplied by writing it on paper.

Supplement: After reading Wu Enda's matrix review in the first week, I found that I was wrong. Matrix multiplication obeys the associative law and not the commutative law:
Matrices are not commutative: A ∗ BA ∗ BAB B ∗ A B ∗ A BA
Matrices are associative: ( A ∗ B ) ∗ C = A ∗ ( B ∗ C ) (A∗B)∗C=A∗(B∗C) (AB)C=A(BC)

insert image description here
The derivation in this section is a bit hard-core, so we have to look at linear algebra and Lagrange's theorem.
insert image description here
Said that this is very similar to PCA, Red Stone recommended a website that introduces the mathematical principles of PCA , and I have time to take a look. . .


14 15 16

Finally brushed! ! ! Oops, Mr. Lin's summary in the last chapter 16 is very, very good! ! To sum up, most of them know what it is, but they don’t know how to implement it in practice. After all, after listening to the algorithm from the senior brother, they may not be able to write hhh. However, PCA is really weak, with only a vague impression. When I read it, I couldn’t remember what it was, but when I turned it back and read it, I remembered it, and it felt right. . . In fact, it is a lot of nouns, and it is not difficult (maybe it has not been realized. 2333). Let’s
take a final cut of the jungle of machine learning:
insert image description here
After reading the last chapter, I feel quite tired, but when the last chapter is finally summarized, I have a feeling It's just " I still want to learn more!! " Thank you very much Mr. Lin for leading me~
Now let's start with Ng Enda! ! ! !

Guess you like

Origin blog.csdn.net/Only_Wolfy/article/details/89608647