Understand the cost function

Q: Why will mention understanding of the cost function?

A: A total ML linear regression, logistic regression, etc. are not open around the cost function.

Understand the cost function: what is? Action principle? Why is this cost function?

 

1, what the cost functions are?

  The cost function is used to find the optimal solution of the objective function, the role of which is the cost function.

  Loss function (Loss Function) is defined on a single sample, a sample count of the error.

  Cost function (Cost Function) is defined on the entire training set, the average error of all samples, i.e., the average loss function.

  The objective function (Object Function) is defined as: the ultimate need to optimize the function. + Experience equal to the risk of structural risk (ie, Cost Function + regularization term).

 

2, the role of the principle of cost function

  For regression problems, we need to find the optimal solution to solve the cost function, commonly used is the squared error cost function.

  For example, assume the following functions for:

  There are θ0 and θ1 changing two parameters, the parameters will result in a change of the function is assumed, for example:

  Realistic example, the data would point to us in many forms, and we want to solve regression problems, we need these points fit to a straight line, find the optimal θ0 and θ1 to make this line more representative of all the data .

  And how to find the optimal solution of it, we need to use to solve the cost function to square error cost function as an example.

  From the simplest single parameter point of view, assuming that function:

  Squared error cost function The main idea is the corresponding value of the fitted line with our actual data given values do difference, we obtain a straight line fitted to the actual gap .

  To make this data value is not the influence of individual extreme fluctuate greatly, then take a similar manner variance of one-half to reduce the effect of individual data.

  In this way, it creates a cost function:

   And the optimal solution is the minimum cost function, the image can be obtained according to the above formula cost function multiple times:

  Solutions (derivative) : see indeed that the minimum cost function, where the abscissa is time to exactly 1.

  If more parameters, it would be more complicated, when the two parameters is already a three-dimensional image:

  Height is the value of the cost function , you can see that it still has a minimum value, to reach more parameters like this when you can not visualize, but the principles are similar.

  Thus, for regression problems, we can get down to the minimum of the cost function:

 

3, why the cost function is this?

First thought: What is the price?

  Simple to understand is that the cost gap (distance between two points) between the predicted and actual values, that for a plurality of samples, it is the gap between the sum.

Consideration of positive and negative issues:

  If the direct use, this formula seems to indicate that the difference between assumptions and actual values, the difference between each sample and then this does not add up is the price yet, but think about it, if you use this formula, so in terms of a single sample on , the cost of both positive and negative, add up the cost of all the samples of possible positive and negative balance, so this is not an appropriate cost function.

There are both positive and negative solve the problem:

  Using the absolute value function to represent the cost, in order to facilitate calculation of a minimum cost (the cost may be used to calculate the minimum least square method), to measure directly the square of the cost, i.e., use of the square of the absolute value to represent the cost of a single sample , then a data set consideration of:

Whether the sum of the squares would not be any problem?

  The cost function should be related to the number of samples, or a sample and compare, there is little sense of the gap between the square and m samples, so will the price of m samples and 1 / 2m multiplied, that the cost function is:

  As regards, instead of taking 2m m, in order to facilitate the calculation.

Guess you like

Origin www.cnblogs.com/geaozhang/p/11442343.html