Li Hongyi Machine Learning Course Notes-1 | CSDN Creation Punch Card

Li Hongyi Machine Learning Course Notes


foreword

Li Hongyi's 2021 Spring Machine Learning Course Notes
Part 1


1. What is machine learning?

In general, machine learning can be described in one sentence. Machine learning is to enable machines to have the ability to find an equation.

2. How to find this equation

The process of machine learning to find this function is divided into three steps

  1. Write an equation with unknown parameters
  2. Define a thing called Loss
  3. solve an optimization

We use the Youtube channel to predict the number of viewers to explain to you how these three steps work

1. Write a function with unknown parameters

To put it simply, we first guess what the mathematical formula of the function we are looking for looks like. For example, for the prediction of the number of viewers of Youtube channels, guess a simple one-time function:
y=w*x+b
y is what we want to predict
x is the input
w and b are unknown parameters

2. Define a loss function

For our prediction equation and its parameters, we need an equation that says whether this equation is what we expect. To put it simply, for the above y=w x+b, we can use the difference between the predicted Y and the real Y and then take the absolute value to represent the gap between the two, which is the value of loss, which is the value of this parameter good or bad.

3. Find the best parameters

So how do we find a minimum value to minimize this loss?
If we only pay attention to one of the parameter variables, the rest will not move. Then this problem is to find the minimum value of a function. When the first derivative is 0, we can find the extreme point. When we don't have a specific function, we can first define an initial point, and then find the differential of the point, that is, the slope corresponding to the point, and then use this slope to determine which direction to go next to approach the bottom of the image , repeat this step until the differential is 0. Obviously, there may be more than one extreme point for a function, then this method may only get a local minima, not a global optimal solution.

But the original words of the teacher on this question are: In fact, assuming that you have done deep learning related things, assuming that you have trained your own network and done Gradient Descent experience, in fact, local minima is a fake problem, we are doing it When working with Gradient Descent, the real problem is not local minima, what exactly is it, we will talk about this later, you have to accept it first, and believe what most people say, Gradient Descent has the problem of local minima , In this picture and in this example, there is obviously a local minima problem, but I will tell you later what the real pain point of Gradient Descent is.
insert image description here

4. How to do better

The above predictions are all based on the real data we already have. What about the actual prediction results?
insert image description here
Looking at this result, we can see that the predicted result is almost the same as the previous day's data. In fact, the loss on the training data is 0.45k, and the actual prediction is an error of 0.58k. In addition, we can notice that this data seems to be regular, and it will rise and fall periodically, so if we take this cycle into account, for example, using the value of seven points for prediction, the result may be better.
insert image description here
Then we got a lower loss, which is 0.38k, and it is better in the real prediction, which is 0.49k.

Summarize

Many of the models we study are linear systems.
There is a definition of a linear system, and my simple understanding is:
1.

输入R
系统
输出C
k*输入R
系统
k*输出C

That is, k times the input, its output should also be k times.
2.

输入R1
系统
输出C1
输入R2
系统
输出C2
输入R1+R2
系统
输出C1+C2

The principle of superposition is satisfied between different inputs.
A system that satisfies the above two conditions is a linear system
, such as y=kx, integration, differentiation, matrix transposition, etc.,
but y=x^2 is not.
In common circuits, resistors, inductors, and capacitors are also linear systems, and circuits composed of them are also linear systems.


Guess you like

Origin blog.csdn.net/m0_66478571/article/details/122801665