Li Hongyi Machine Learning Course Notes
Article directory
foreword
Li Hongyi's 2021 Spring Machine Learning Course Notes
Part 1
1. What is machine learning?
In general, machine learning can be described in one sentence. Machine learning is to enable machines to have the ability to find an equation.
2. How to find this equation
The process of machine learning to find this function is divided into three steps
- Write an equation with unknown parameters
- Define a thing called Loss
- solve an optimization
We use the Youtube channel to predict the number of viewers to explain to you how these three steps work
1. Write a function with unknown parameters
To put it simply, we first guess what the mathematical formula of the function we are looking for looks like. For example, for the prediction of the number of viewers of Youtube channels, guess a simple one-time function:
y=w*x+b
y is what we want to predict
x is the input
w and b are unknown parameters
2. Define a loss function
For our prediction equation and its parameters, we need an equation that says whether this equation is what we expect. To put it simply, for the above y=w x+b, we can use the difference between the predicted Y and the real Y and then take the absolute value to represent the gap between the two, which is the value of loss, which is the value of this parameter good or bad.
3. Find the best parameters
So how do we find a minimum value to minimize this loss?
If we only pay attention to one of the parameter variables, the rest will not move. Then this problem is to find the minimum value of a function. When the first derivative is 0, we can find the extreme point. When we don't have a specific function, we can first define an initial point, and then find the differential of the point, that is, the slope corresponding to the point, and then use this slope to determine which direction to go next to approach the bottom of the image , repeat this step until the differential is 0. Obviously, there may be more than one extreme point for a function, then this method may only get a local minima, not a global optimal solution.
But the original words of the teacher on this question are: In fact, assuming that you have done deep learning related things, assuming that you have trained your own network and done Gradient Descent experience, in fact, local minima is a fake problem, we are doing it When working with Gradient Descent, the real problem is not local minima, what exactly is it, we will talk about this later, you have to accept it first, and believe what most people say, Gradient Descent has the problem of local minima , In this picture and in this example, there is obviously a local minima problem, but I will tell you later what the real pain point of Gradient Descent is.
4. How to do better
The above predictions are all based on the real data we already have. What about the actual prediction results?
Looking at this result, we can see that the predicted result is almost the same as the previous day's data. In fact, the loss on the training data is 0.45k, and the actual prediction is an error of 0.58k. In addition, we can notice that this data seems to be regular, and it will rise and fall periodically, so if we take this cycle into account, for example, using the value of seven points for prediction, the result may be better.
Then we got a lower loss, which is 0.38k, and it is better in the real prediction, which is 0.49k.
Summarize
Many of the models we study are linear systems.
There is a definition of a linear system, and my simple understanding is:
1.
That is, k times the input, its output should also be k times.
2.
The principle of superposition is satisfied between different inputs.
A system that satisfies the above two conditions is a linear system
, such as y=kx, integration, differentiation, matrix transposition, etc.,
but y=x^2 is not.
In common circuits, resistors, inductors, and capacitors are also linear systems, and circuits composed of them are also linear systems.