一、损失函数

损失函数（Loss Function）通常用来估计模型的预测值 $f (x)$ 与真实值Y的不一致的程度，它是一个非负实值函数，这里使用 $L (Y, f (x))$ 表示。其损失值越小，代表模型的健壮性越好。

损失函数（Loss Function）是经验风险函数的核心部分，也是结构风险函数的重要组成部分。

模型的结果风险函数包括了经验风险项和正则项，通常表示如下：
在这里插入图片描述

1.1 均方损失函数

使用最小二乘法，基本原理是，最优拟合曲线应该使所有点到回归直线的距离最小。通常使用欧几里得距离进行距离的度量。均方损失的损失函数如下：
$L(Y|f(X))=\sum (Y-f(X))^{2}$

1.2 log对数损失函数

逻辑回归损失函数就是对数损失函数。

$L (Y ∣ f (X)) = - l o g P (Y ∣ X)$

1.3 指数损失函数

AdaBoost就是指数损失函数。其标准形式如下：
$L (Y ∣ f (X)) = e x p [- y f (x)]$

二、优化方法

优化问题指的是，给定目标函数 $L (Y ∣ f (X))$ ，我们需要找到一组参数X，使得 $L (Y ∣ f (X))$ 的值最小。在机器学习邻域中，梯度下降的方式有三种，分别是：批量梯度下降（BGD）、随机梯度下降（SGD）、小批量梯度下降（Min_BGD）。

2.1 批量梯度下降（BGD）

基本思想： 梯度下降法最原始的形式，它的具体思路是，在更新每一参数时，都使用所有的样本来进行更新。

扫描二维码关注公众号，回复： 11687485 查看本文章

优点： 全局最优解；易于并行实现；

缺点： 当样本数目很多时，训练过程会很慢。

对象： 样本量比较小的。

import numpy as np
import matplotlib.pyplot as plt

def BGD(x_vals, y_vals):

    alpha = 0.001  # 步长
    loop_max = 100     # 迭代次数
    theta = np.random.randn(2)  # 存储 权重、偏移量
    m = len(x_vals)
    b = np.full(m, 1.0)
    x_vals = np.vstack([b, x_vals]).T
    error = np.zeros(2)
    for i in range(loop_max):
        sum_m = np.zeros(2)
        for j in range(m):
            dif = (np.matmul(theta, x_vals[j]) - y_vals[j]) * x_vals[j]
            sum_m = sum_m + dif
        theta = theta - alpha * sum_m
        if np.linalg.norm(theta - error) < 0.001:
            break
        else:
            error = theta
        print('loop count = %d' % i, '\t theta:',theta)
    return theta

if __name__ == '__main__':
    np.random.seed(0)
    # iris = datasets.load_iris()
    # x_vals = np.array([x[3] for x in iris.data])
    # y_vals = np.array([y[0] for y in iris.data])
    x_vals = np.arange(0., 10., 0.2)
    y_vals = 2 * x_vals + 5 + np.random.randn(len(x_vals))
    theta = BGD(x_vals, y_vals)

    #  画图
    plt.plot(x_vals, y_vals, 'g*')
    plt.plot(x_vals, theta[1] * x_vals + theta[0], 'r')
    plt.show()

在这里插入图片描述

2.2 随机梯度下降（SGD）

基本思想： 每次迭代时，都是将一个一个样本进行更新。

优点： 训练速度快；

缺点： 准确度下降，并不是全局最优；不易于并行实现。

对象： 样本数太大，或者在线算法。

import numpy as np
import matplotlib.pyplot as plt

def SGD(x_vals, y_vals):

    alpha = 0.01  # 步长
    loop_max = 100     # 迭代次数
    theta = np.random.randn(2)  # 存储 权重、偏移量
    m = len(x_vals)
    b = np.full(m, 1.0)
    x_vals = np.vstack([b, x_vals]).T
    error = np.zeros(2)
    np.random.seed(0)
    for i in range(loop_max):

        for j in range(m):
            dif = np.matmul(theta, x_vals[j]) - y_vals[j]
            theta = theta - alpha * dif * x_vals[j]

        if np.linalg.norm(theta - error) < 0.001:
            break
        else:
            error = theta

        print('loop count = %d' % i, '\t theta:',theta)
    return theta

if __name__ == '__main__':

    x_vals = np.arange(0., 10., 0.2)
    y_vals = 2 * x_vals + 5 + np.random.randn(len(x_vals))

    theta = SGD(x_vals, y_vals)

    #  画图
    plt.plot(x_vals, y_vals, 'g*')
    plt.plot(x_vals, theta[1] * x_vals + theta[0], 'r')
    plt.show()

在这里插入图片描述

2.3 小批量梯度下降（Min_BGD）

基本思想： 结合BGD和SGD思想，每次迭代时，使用一部分数据进行BGD操作。

优点： 为了克服上面两种方法的缺点，又同时兼顾两种方法的优点；

对象： 。在实际的一般情况下。

import numpy as np
import matplotlib.pyplot as plt

def min_batch(x_vals, y_vals):

    alpha = 0.01  # 步长
    loop_max = 100     # 迭代次数
    batch_size=5
    theta = np.random.randn(2)  # 存储 w、b
    m = len(x_vals)

    b = np.full(m, 1.0)
    x_vals = np.vstack([b, x_vals]).T

    error = np.zeros(2)
    np.random.seed(0)
    for j in range(loop_max):
        for i in range(1,m,batch_size):
            sum_m = np.zeros(2)
            for k in range(i-1,i+batch_size-1,1):
                dif = (np.dot(theta, x_vals[k]) - y_vals[k]) *x_vals[k]
                sum_m = sum_m + dif
            theta = theta- alpha * (1.0/batch_size) * sum_m

        if np.linalg.norm(theta - error) < 0.001:
            break
        else:
            error = theta

        print('loop count = %d' % j, '\t theta:',theta)
    return theta

if __name__ == '__main__':

    x_vals = np.arange(0., 10., 0.2)
    y_vals = 2 * x_vals + 5 + np.random.randn(len(x_vals))

    theta = min_batch(x_vals, y_vals)

    #  画图
    plt.plot(x_vals, y_vals, 'g*')
    plt.plot(x_vals, theta[1] * x_vals + theta[0], 'r')
    plt.show()

在这里插入图片描述

2.优化方法——梯度下降（BGD、SGD、Min_BGD）

文章目录

一、损失函数

1.1 均方损失函数

1.2 log对数损失函数

1.3 指数损失函数

二、优化方法

2.1 批量梯度下降（BGD）

2.2 随机梯度下降（SGD）

2.3 小批量梯度下降（Min_BGD）

猜你喜欢