斯坦福cs231n课程记录——assignment1 Softmax

一、Softmax原理

二、某些API解释

lambda函数

作用：lambda 定义了一个匿名函数

　　lambda 并不会带来程序运行效率的提高，只会使代码更简洁。

https://blog.csdn.net/SeeTheWorld518/article/details/46959593

三、Softmax实现

1.softmax朴素实现方法

def softmax_loss_naive(W, X, y, reg):
    """
    Softmax loss function, naive implementation (with loops)

    Inputs have dimension D, there are C classes, and we operate on minibatches
    of N examples.

    Inputs:
    - W: A numpy array of shape (D, C) containing weights.
    - X: A numpy array of shape (N, D) containing a minibatch of data.
    - y: A numpy array of shape (N,) containing training labels; y[i] = c means
      that X[i] has label c, where 0 <= c < C.
    - reg: (float) regularization strength

    Returns a tuple of:
    - loss as single float
    - gradient with respect to weights W; an array of same shape as W
    """
    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)

    for i in xrange(X.shape[0]):
        score = np.dot(X[i], W)
        score -= max(score)               # 为了数值稳定性
        score = np.exp(score)             # 取指数
        softmax_sum = np.sum(score)       # 得到分母
        score /= softmax_sum              # 除以分母得到softmax
        # 计算梯度
        for j in range(W.shape[1]):
            if j != y[i]:
                dW[:, j] += score[j] * X[i]
            else:
                dW[:, j] -= (1 - score[j]) * X[i]

        loss -= np.log(score[y[i]])       # 得到交叉熵
    loss /= X.shape[0]                    # 平均
    dW /= X.shape[0]                      # 平均
    loss += reg * np.sum(W * W)           # 加上正则项
    dW += 2 * reg * W


    return loss, dW

2.Softmax向量实现方法

def softmax_loss_vectorized(W, X, y, reg):
    """
    Softmax loss function, vectorized version.

    Inputs and outputs are the same as softmax_loss_naive.
    """
    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)


    scores = np.dot(X, W)                            # 计算得分
    scores -= np.max(scores, axis=1, keepdims=True)  # 数值稳定性
    scores = np.exp(scores)                          # 取指数
    scores /= np.sum(scores, axis=1, keepdims=True)  # 计算softmax
    ds = np.copy(scores)                             # 初始化loss对scores的梯度
    ds[np.arange(X.shape[0]), y] -= 1                # 求出scores的梯度
    dW = np.dot(X.T, ds)                             # 求出w的梯度
    loss = scores[np.arange(X.shape[0]), y]          # 计算loss
    loss = -np.log(loss).sum()                       #求交叉熵
    loss /= X.shape[0]
    dW /= X.shape[0]
    loss += reg * np.sum(W * W)
    dW += 2 * reg * W

    return loss, dW

四、作业问题记录

1.Why do we expect our loss to be close to -log(0.1)? Explain briefly.

Your answer: 因为我们的权重矩阵乘以0.001之后导致里面的值都非常小接近于0,所以得到的分值向量里的值也都接近于0。因为w是随机初始化，所以每个类计算的得分都是相同的，经过softmax之后的概率也是一样的，而现在是10分类问题，所以每个类的概率是0.1,经过交叉熵得到的loss就是-log（0.1）。

Inline Question - True or False

It's possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.

Your answer:True

Your explanation:因为 svm 的公式是

$loss = \sum_{i \neq j} max(0, s_i - s_j + \Delta)$