Optimization Algorithms in Machine Learning

Basics of Machine Learning and Optimization

Machine Learning and Optimization

To quote the big guy Pedro Domingos: Machine learning is actually composed of three parts: model representation, optimization and model evaluation . Transform a practical problem into a model to be solved, use an optimization algorithm to solve the model, use verification or test data to evaluate the model, and repeat these three steps until a satisfactory model is obtained.

Therefore, the optimization algorithm plays a connecting role in machine learning!

The optimization propositions involved in general machine learning can be expressed as:

And so on and so on, machine learning algorithms are similar.

Basics of Optimization Algorithms

The order of the optimization algorithm

The so-called order of the optimization algorithm actually refers to the optimization process using

which information in .

If the function form is unknown, the gradient is difficult to find or does not exist, the zero-order optimization algorithm is often used; in the field of machine learning, the first-order algorithm is generally used more often, and the second-order algorithm may converge faster but the calculation cost is also greater.

Common components of an optimization algorithm

  • gradient descent

 

  • dual

When solving an optimization proposition, if its dual form is easy to solve, it is often possible to avoid directly solving the original problem by solving the dual problem. For example, the typical SVM in machine learning involves duality theory, as well as concepts such as Lagrange multiplier method and KKT condition. First of all, let me briefly explain what these concepts are for.

  1. Duality theory: Duality is twins, and an optimization proposition also has its corresponding sibling optimization proposition.

  2. Lagrangian function: Integrate the objective function and constraints of the original optimization proposition into one function.

  3. KKT condition: The property that the optimal value of the function satisfies.

If the original problem is a convex problem, then the KKT condition is a necessary and sufficient condition, that is to say, the point that satisfies the KKT condition is the optimal solution of the original problem and the dual problem, then the dual problem can be replaced by solving the KKT condition original question. (The specific derivation and details will not be expanded, and you can write a separate article next time)

  • randomize

Typical Algorithms for Unconstrained Problems

  • Gradient Descent

It has been mentioned above and will not be repeated.

Classic Algorithms for Constrained Problems

  • Projected gradient descent

Looking at the name, we can know that the idea of ​​this method is actually gradient descent plus projection operation to satisfy the constraints . It can be understood as a two-stage algorithm,

In the first stage, gradient descent is performed first

 

  • coordinate descent method

The idea of ​​the coordinate ascending method is a bit similar to ADMM, that is, only one or a part of the variables are optimized each time, and then other variables are fixed , that is

This is a bit like a high-dimensional coordinate system, where you optimize one dimension by one in order.

When Optimization Problems Meet Big Data

When the amount of data is large, the simple way to deal with it is to use the idea of ​​randomization. For example, the gradient descent method can be changed to random gradient descent, and the coordinate ascent method can be changed to random coordinate ascent.

Accelerate Optimization and Outlook

So-called accelerated optimization studies improving algorithms to increase convergence speed without making stronger assumptions . Common examples include Heavy-Ball method, Nesterov’s accelerated gradient descent method, accelerated proximal gradient method (APG), random variance reduction gradient method, and so on. These algorithms may be a bit beyond the outline. Those who are interested or specialize in this kind of problem can refer to the new book of Mr. Lin Zhouchen

Some research on large-scale optimization can be carried out from the following perspectives: stochastic optimization, distributed optimization, asynchronous optimization, learning-based optimization, and so on.

 

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/131443803