Inventory of common loss functions and optimization algorithms in machine learning

In machine learning, there are different definitions for objective functions, loss functions, cost functions, etc. in different books. Generally speaking, the objective function can measure the quality of a model. For the optimization of the model, the maximization or minimization of the model is usually solved. When the minimization is obtained, it is also called the loss function or the loss function, also known as the cost function and the cost function. . In most cases there is no strict distinction between the two. The loss function includes loss terms and regularization terms. The purpose of the regularization term is to improve the generalization ability of the model and prevent overfitting. This article only discusses the loss term, the following are the loss terms of some common loss functions.
write picture description here

1.Gold Standard Loss

又被称为0-1 loss, 记录分类错误的次数。

2.Hinge Loss

  最常用在 SVM 中的最大化间隔分类中。对可能的输出 t = ±1和分类器分数y,预测值 y 的 hinge loss 定义如下:    

  L(y) = max(0,1-t*y)

  对于hinge loss,又可以细分出hinge loss(或简称L1 loss)和squared hinge loss(或简称L2 loss)(注意与正则化有区别)。

3. Log Loss logarithmic loss

 对于对数函数,由于其具有单调性,在求最优化问题时,结果与原始目标一致,在含有乘积的目标函数中(如极大似然函数),通过取对数可以转化为求和的形式,从而大大简化目标函数的求解过程。

此外,由于log函数是单调递增,为了转化为最小化问题,通常添加负号,即通常所说的negative log function.

4. Squared Loss

即真实值与预测值之差的平方和。通常用于线性模型中,如线性回归模型。之所以采用平方的形式,而非绝对值或三次方的形式,是因为极大似然与最小化平方损失是等价的,具体推导可以参考https://zhuanlan.zhihu.com/p/26171777。

5. Exponential Loss

指数函数具有单调性,非负性的优良性质,使得越接近正确结果误差越小,Adaboost算法即使用的指数损失目标函数。但是指数损失存在的一个问题是误分类样本的权重会指数上升,如果数据样本是异常点,会极大的干扰后面基本分类器学习效果,这也是Adaboost算法的一个缺点。

此外还有绝对值损失,通常用于回归中。

下面是机器学习中常见算法的损失函数。如果需要详细了解背后的原理,可以查阅相关资料。

write picture description here

值得注意的是,上述目标函数中许多没有解析解,对于此类问题,我们通常可以采取一些迭代的算法来解决。下面是一些常见的优化算法。

write picture description here

Summarize:

This article takes stock of the forms of the common objective function (loss function) loss term in machine learning, as well as the specific form of the objective function of common algorithms, and finally gives the common optimization algorithms and their advantages and disadvantages. It is worth noting that this article does not represent a complete summary. If there are any omissions and mistakes, please correct me!

Please indicate the source. Welcome to pay attention to this public account to see more dry goods!

write picture description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325901095&siteId=291194637