emmm Now I want to try to only read the notes of the red stone, and then write if there is anything to add, to see if the effect is good, and the progress is fast (it must be fast)~ because the content is combined without anything.
6 Support Vector Regression
P22 6.1
hereb TK b b^TKbbT Kβ, this is written because: ifβ 3 TK β 5 β^T_3Kβ_5b3TKβ5If , then K (3,5) is multiplied by K , and the coefficient to be multiplied by two βs is obtained.
K plus a T will not change, because it is a symmetric matrix, and then add III because λ is just a coefficient, so we need to get an extra matrix.
P23 6.2
P24 6.3
? ? ? Why is it subtracted
? Answer: Here it is enough to push and push according to Chapter 4.
Then here, the correspondence/analogy between colors is enough, for example:
1 ( yn − w T zn − b ) ≤ ε + 1(y_n-w^Tz_n -b)\leqε+1(yn−wTzn−b)≤ε + ξ^n:
Simplified:w T zn + b ≥ wTz_n+b≥in T zn+b≥ yn y_nyn- ε - ξ ^ n
精比yn ( w T zn + b ) ≥ 1 − ξ n y_n(w^Tz_n+b) ≥ 1-ξ_nyn(wTzn+b)≥1−Xn,
1就是yn y_nyn,yn y_nyn- ε is 1, and the same applies to others
P25 6.4
Summary
7 Blending and Bagging
I tried a chapter without taking notes and remembered it in detail, I feel. . good? But the SVM is over, so I still need to review it.
Before P26 7.1
, it was validation. The aggregation to be discussed next is to use collective wisdom to solve problems in a bunch of (maybe not very good) hypotheses. The combination of a bunch of weak and weak ones may be very strong. .
8 Adaptive Boosting
Much faster progress (of course)
P30 8.1
P31 8.2
P32 8.3
actually did so many operations to make g different (these g should not be very good), just to make g different, and these g are relatively simple, just to make mistakes more at a certain angle Small, just like an 8.1 primary school student knows Apple, g can perform well in some methods, but it is not very usable.
Section 8.x also said that with different g (different voices), the algorithm made (I don’t know what algorithm, oh, adaptive boosting), the effect is good. If they are all similar, there is no average meaning.
9 Decision Tree
P35 9.1
There are many Decision Tree models, and there is no theoretical guarantee, but in experience, it is not bad to use. Here is a Decision Tree that meets this course.
P36 9.2
said that there are two classes, one is μ, and the other is 1 - μ, and then substitute. = = Look at me several times, μ = N 1 N \frac{N1}{N}NN 1, should be given, because K = 2, directly 1 - μ is enough.
P36 9.3
P37 9.4
10 Random Forest
At the end of the P38 10.1
video, I don’t quite understand why it is possible to do this. What is pi ∈ basis?
It has been solved, in order to avoid the interference of unimportant features, to see which feature is more important, use
P39 10.2
P40 10.3
P41 10.4 with importance(i)
11 Gradient Boosted Decision Tree
P42 11.1
P43 11.2
How to find the derivative?
P44 11.3
P45 11.4
Summary, a bunch of things are really messy. After reading other people’s notes and then reading Teacher Lin’s summary, it feels smoother, but you still have to do it yourself to really understand it~ The next step is neural network and deep learning. I have never known the difference between ML, NN, and DL. I am excited! ٩(๑>◡<๑)۶
12 Nerual Network
P46 12.1
P47 12.2
P48 12.3
P49 12.4
13 Nerual Network
P50 13.1
P51 13.2
P52 13.3
P50 13.4
I am a little confused here, I don’t know why it is WTW^TWT , in fact, you can know it by writing on paper,WWW is dxd~, naturallyWTW^TWT is d~xd, and then x is dx 1 dimension. Although the light yellow there is writtenh ( x ) = WWT xh(x) = WW^Txh(x)=WWT x, actuallyh ( x ) = W ( WT x ) h(x) =W(W^Tx)h(x)=W(WT x), you need to calculate the ones in the brackets first, and you can know that the resulting matrix cannot be multiplied by writing it on paper.
Supplement: After reading Wu Enda's matrix review in the first week, I found that I was wrong. Matrix multiplication obeys the associative law and not the commutative law:
Matrices are not commutative: A ∗ BA ∗ BA∗B ≠ B ∗ A B ∗ A B∗A
Matrices are associative: ( A ∗ B ) ∗ C = A ∗ ( B ∗ C ) (A∗B)∗C=A∗(B∗C) (A∗B)∗C=A∗(B∗C)
The derivation in this section is a bit hard-core, so we have to look at linear algebra and Lagrange's theorem.
Said that this is very similar to PCA, Red Stone recommended a website that introduces the mathematical principles of PCA , and I have time to take a look. . .
14 15 16
Finally brushed! ! ! Oops, Mr. Lin's summary in the last chapter 16 is very, very good! ! To sum up, most of them know what it is, but they don’t know how to implement it in practice. After all, after listening to the algorithm from the senior brother, they may not be able to write hhh. However, PCA is really weak, with only a vague impression. When I read it, I couldn’t remember what it was, but when I turned it back and read it, I remembered it, and it felt right. . . In fact, it is a lot of nouns, and it is not difficult (maybe it has not been realized. 2333). Let’s
take a final cut of the jungle of machine learning:
After reading the last chapter, I feel quite tired, but when the last chapter is finally summarized, I have a feeling It's just " I still want to learn more!! " Thank you very much Mr. Lin for leading me~
Now let's start with Ng Enda! ! ! !