Chapter VII of the optimization algorithm

 = Model characterizes the machine learning algorithm + + optimization model evaluation. Where the optimization algorithm does matter is the best model found in the model assessment index model characterization.

Most of the current tools of machine learning has built commonly used optimization algorithm, only one line of code to complete the call to the actual application. However, in view of the important role of machine learning algorithm optimization, understand the principles of optimization algorithms is also necessary.

 

1 there is loss function supervised learning

Q1: there is a loss of function supervised learning involves what? Please list and briefly their characteristics.

A1: 0-1 loss function (sign)

  Folding loss function (ReLU)

  logistic loss function

  Cross-entropy loss function

  For regression problems, common loss function is quadratic loss function ; however, when the predicted value is farther away from the true value, the greater the loss of function of the square of the punishment, and therefore relatively sensitive to outliers. To solve this problem, you can use absolute loss function .

  Considering can guide and robustness to outliers can be used Huber loss function .

 

Optimization of Machine Learning 2

Q1: optimization problem in machine learning, upon which convex optimization problem, in which a non-convex optimization problem, please give an example of each.

A1: definition of convex function reference blog https://cloud.tencent.com/developer/news/335461

  Convex optimization problem, all of the local minimum is a global minimum, thus this problem is generally considered to be relatively easy to solve the problem. On the other hand, principal component analysis optimization problem corresponding to a non-convex optimization problem. Other examples of convex optimization problems include support vector machines, linear regression model is linear, non-convex optimization problem examples include low-rank model (e.g., matrix decomposition), the depth of the neural network model.

 

3 classic optimization algorithm

Q1: Optimization of unconstrained optimization problems What?

A1: classical optimization algorithms can be divided into direct iterative method where two categories.

  Direct method requires that the target function need to satisfy two conditions. The first condition is, L (.) Is a convex function, and the second condition is that the formula has a closed-form solution. Meet these two conditions classic example of style ridge regression . Direct method to satisfy these two conditions limit its scope of application. Thus, in many practical problems, we will use an iterative method.

  Iterative method is iteratively revised estimates of the optimal solution. (Second-order method is also known as non-Newtonian law)

 

4 gradient verification

Q1: How to find the objective function gradient function to verify the correctness?

A1: a bunch of mathematical proof, according to the definition of the inverse. See P152-154

  

5 stochastic gradient descent method

Q1: When a particularly large amount of training data, there is the classic gradient descent problem needs to be done on how to improve?

A1: The classic gradient descent at each of the model parameters are updated, need to traverse all of the training data. When large M, the computation which requires large, takes a long calculation time, in practice, the basic feasible.

  In order to reduce the variance of the stochastic gradient, so that the iterative algorithm more stable, but also to take advantage of a highly optimized matrix arithmetic operation, in the practical application we will also deal with a number of training data, which is called low-volume gradient descent method. (Mini-Batch Gradient Descent)

Q2: How to choose the parameter m?

A2: when m is generally a power of two can take full advantage of the matrix arithmetic operation

Q3: How to choose m training data?

A3: In order to avoid the effects of a particular sequence of data to bring convergence, usually before each traversal of the training data, first of all sort of random data, then choose the order of m training data at each iteration until traversal Once all the data .

Q4: How to choose the learning rate α?

A4: In order to speed up the convergence rate, while increasing the solution accuracy, often use the learning rate decay scheme: Start a larger learning rate algorithm, when the error curve into the plateau, the learning rate is reduced to make finer adjustments.

 

Usually a small batch gradient descent method to solve the problem of excessive training data .

 

6 accelerated stochastic gradient descent method

Q1: Failure Causes stochastic gradient descent method of trial and down ----------------

A1: This problem is very image of the book metaphor, reference P158-159

Q2: Road ---- inertia kept solution where environmental perception

A2: Momentum Method ---- hear to know the name linked with high school physics, think of inertia. See in particular P160 authors cite the example is very nice. Compared with the stochastic gradient descent, the convergence rate faster method of moments, more stable convergence curve p.

  The method employed AdaGrad ------- "level gradient method and the historical" to measure different parameters of the sparsity of the gradient, the smaller the value the more sparse

  Adam ----------- method and inertia to maintain situational awareness of these two advantages rolled into one. Further, Adam also takes into account the offset correction value under zero initial conditions.

  AdaDelta and RMSProp. These two methods are very similar, is an improvement over the method of AdaGrad

 

7 regularization and sparsity

Q1: L1 regularization what makes the model parameters have sparsity principle is?

A1: this is very important, it can be answered from three perspectives: the shape of the solution space, overlay function, a Bayesian prior.

This book is about is in place, then carefully studied P164-168 back

 

Guess you like

Origin www.cnblogs.com/guohaoblog/p/11220587.html
Recommended