The simplest understanding of gradient descent in machine learning/deep learning

1: Determine a small target: prediction function

  A common task of machine learning is to automatically discover the laws behind the data through learning algorithms, continuously improve the model, and then make predictions.

That is: learning algorithm—discovering rules—improving model—making predictions.
  For the convenience of understanding, let’s take a simple example: there are some sample points in a two-dimensional Cartesian coordinate system, and the horizontal and vertical coordinates represent a group of causal variables respectively. Such as the price and size of the house.

insert image description here

  Common sense tells us that their distributions are proportional, that is, a straight line y=wx passing through the origin. Our task is to design an algorithm so that the machine can fit these data and help us calculate the parameter w of the straight line.
  A simple method is: first randomly select a straight line passing through the origin.
insert image description here
  Then calculate the degree of deviation between all sample points and the straight line, and then adjust the slope w of the straight line according to the size of the error. In this problem, the line y=wx is the so-called prediction function.
insert image description here

2: Find the gap: cost function
(ps: "The loss function (Loss Function) is defined on a single sample, and it is calculated as the error of a sample. The cost function (Cost Function) is defined on the entire training set, which is all The average of the sample errors, which is the average of the loss function.)

  First, we need to quantify the degree of deviation of the data, that is, the error. The most common method is mean squared error , which, as the name suggests, is the average of the sum of squared errors.
  Let's first look at a point p1 (x1, y1), and the corresponding error is e1 (mean square error here). We can know that e1=(y1-wx1)².

insert image description here
  After expanding with the complete square formula, the formula shown in the figure below is obtained. Similarly, the errors e2, e3, en from points p2, p3 to pn are also in the same form. Our purpose is to find the average of the errors of all points, considering that x, y and the number of samples n are all known numbers.
insert image description here

  Therefore, by merging similar terms, and then replacing the coefficients of different terms with constant abc respectively. --------To be continued

Guess you like

Origin blog.csdn.net/weixin_45185577/article/details/127471868