目录
- Softmax原理
- 某些API解释
- Softmax实现
- 作业问题记录
- Softmax优化
- Softmax运用
- 参考文献
一、Softmax原理
二、某些API解释
lambda函数
作用:lambda 定义了一个匿名函数
lambda 并不会带来程序运行效率的提高,只会使代码更简洁。
https://blog.csdn.net/SeeTheWorld518/article/details/46959593
三、Softmax实现
1.softmax朴素实现方法
def softmax_loss_naive(W, X, y, reg):
"""
Softmax loss function, naive implementation (with loops)
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
for i in xrange(X.shape[0]):
score = np.dot(X[i], W)
score -= max(score) # 为了数值稳定性
score = np.exp(score) # 取指数
softmax_sum = np.sum(score) # 得到分母
score /= softmax_sum # 除以分母得到softmax
# 计算梯度
for j in range(W.shape[1]):
if j != y[i]:
dW[:, j] += score[j] * X[i]
else:
dW[:, j] -= (1 - score[j]) * X[i]
loss -= np.log(score[y[i]]) # 得到交叉熵
loss /= X.shape[0] # 平均
dW /= X.shape[0] # 平均
loss += reg * np.sum(W * W) # 加上正则项
dW += 2 * reg * W
return loss, dW
2.Softmax向量实现方法
def softmax_loss_vectorized(W, X, y, reg):
"""
Softmax loss function, vectorized version.
Inputs and outputs are the same as softmax_loss_naive.
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
scores = np.dot(X, W) # 计算得分
scores -= np.max(scores, axis=1, keepdims=True) # 数值稳定性
scores = np.exp(scores) # 取指数
scores /= np.sum(scores, axis=1, keepdims=True) # 计算softmax
ds = np.copy(scores) # 初始化loss对scores的梯度
ds[np.arange(X.shape[0]), y] -= 1 # 求出scores的梯度
dW = np.dot(X.T, ds) # 求出w的梯度
loss = scores[np.arange(X.shape[0]), y] # 计算loss
loss = -np.log(loss).sum() #求交叉熵
loss /= X.shape[0]
dW /= X.shape[0]
loss += reg * np.sum(W * W)
dW += 2 * reg * W
return loss, dW
四、作业问题记录
1.Why do we expect our loss to be close to -log(0.1)? Explain briefly.
Your answer: 因为我们的权重矩阵乘以0.001之后导致里面的值都非常小接近于0,所以得到的分值向量里的值也都接近于0。因为w是随机初始化,所以每个类计算的得分都是相同的,经过softmax之后的概率也是一样的,而现在是10分类问题,所以每个类的概率是0.1,经过交叉熵得到的loss就是-log(0.1)。
2.
Inline Question - True or False
It's possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.
Your answer:True
Your explanation:因为 svm 的公式是
有可能加的数据点对svm来讲比较好辨识,所以取max之后都是0,但是对于softmax而言,总会得到一个概率分布,然后算出交叉熵,换言之,softmax的loss总会加上一个量,即使是一个很小的量。
五、Softmax优化
六、Softmax运用
参考文献