斯坦福cs231n计算机视觉训练营----Softmax exercise

第一部分：作业内容

implement a fully-vectorized loss function for the Softmax classifier
implement the fully-vectorized expression for its analytic gradient
check your implementation with numerical gradient
use a validation set to tune the learning rate and regularization strength
optimize the loss function with SGD
visualize the final learned weights

其中SGD：随机梯度下降(Stochastic gradient descent)

第二部分：主要代码以及注释

 #############################################################################
  # TODO: Compute the softmax loss and its gradient using explicit loops.     #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  for i in range(X.shape[0]):
      score = np.dot(X[i], W)
      score -= max(score)  #为了数值稳定性
      score = np.exp(score) #取指数
      softmax_sum = np.sum(score) #得到分母
      score /= softmax_sum  #除以分母得到softmax
      #计算梯度
      for j in range(W.shape[1]):
        if j != y[i]:
          dW[:,j] += score[j]*X[i]
        else:
          dW[:,j] -= (1-score[j])*X[i]
      loss -= np.log(score[y[i]]) #得到交叉熵
  loss /= X.shape[0] #求平均
  dW /= X.shape[0]  #求平均
  loss += reg*np.sum(W*W) #正则项
  dW += 2*reg*W
  #############################################################################
  #                          END OF YOUR CODE                                 #
  #############################################################################

Inline Question 1:

Why do we expect our loss to be close to -log(0.1)? Explain briefly.**

Your answer: 因为w随机初始化，所以每个类计算的得分是相同的，经过softmax之后的概率是一样的，而这是一个十分类的问题，因此每一个类的概率都是0.1，求得交叉熵也就是-log(0.1)

  #############################################################################
  # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  scores = np.dot(X,W)
  scores -= np.max(scores, axis=1, keepdims=True) #为了数值稳定性
  scores = np.exp(scores) #取指数
  scores /= np.sum(scores, axis=1, keepdims= True) #除以分母得到softmax
  ds = np.copy(scores)
  ds[np.arange(X.shape[0]),y] -= 1
  dW = np.dot(X.T,ds) # X*W=S  求导链式法则
  loss = scores[np.arange(X.shape[0]),y]
  loss = -np.log(loss).sum()
  loss /= X.shape[0]  #求平均
  dW /= X.shape[0]  #求平均
  loss += reg * np.sum(W * W)  # 正则项
  dW += 2 * reg * W
  #############################################################################
  #                          END OF YOUR CODE                                 #
  #############################################################################

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e3, 5e3, 7e3]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
from copy import deepcopy
for lr in learning_rates:
    for reg in regularization_strengths:
        softmax = Softmax()
        softmax.train(X_train, y_train, lr, reg, 1500, 128)
        train_pred = softmax.predict(X_train)
        train_acc = np.mean(train_pred == y_train)
        val_pred = softmax.predict(X_val)
        val_acc = np.mean(val_pred == y_val)
        results[(lr, reg)] = [train_acc, val_acc]
        if val_acc > best_val:
            best_val = val_acc
            best_softmax = deepcopy(softmax)
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

Inline Question - True or False

It's possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.

Your answer:True

Your explanation:因为根据svm 的公式有可能加的数据点对svm来讲比较好辨识，所以取max之后都是0，但是对于softmax而言，总会得到一个概率分布，然后算出交叉熵，换言之，softmax的loss总会加上一个量，即使是一个很小的量。

斯坦福cs231n计算机视觉训练营----Softmax exercise

Inline Question 1:

猜你喜欢