The focus here is to do a reading of collaborative filtering algorithms, and more important to me some knowledge.
There pytorch before in some of the function description. .
Loss function
Mean square errors
The least squares method: linear regression method. Should all fit curve and the minimum distance between the point of the regression line.
General use Euclidean distance metric.
$L(Y|f(X)) = \sum\limits_{N}(Y-f(X))^{2}$
log logarithmic loss function: logistic regression loss function is logarithmic loss function.
Sample obey the Bernoulli distribution is assumed, and then find the likelihood function, then the number of extremum.
$L(Y|f(X)) = -logP(Y|X)$
Exponential loss function: AdaBoost is exponential loss function.
$L(Y|f(X)) = exp[-yf(x)]$
CODE
import torch from torch.autograd import Variable import torch.nn as nn import torch.nn.functional as F sample = Variable(torch.ones(2,2)) a=torch.Tensor(2,2) a[0,0]=0 a[0,1]=1 a[1,0]=2 a[1,1]=3 target = Variable (a) #sample: [[1,1],[1,1]] #target: [[0,1],[2,3]] # Basic usage Criterion LossCriterion = () # constructor has its own parameters Loss = Criterion (X, Y) # when calling standard has parameters # calculated results have been taken to an average mini-batch # L1Loss # average absolute error of the predicted value and the true value of Criterion = nn.L1Loss () loss = criterion(sample, target) #1 # SmoothL1Loss # HuberLoss, error in (-1,1) squaring loss, L1 other losses Criterion = nn.SmoothL1Loss () loss = criterion(sample, target) #0.625 # MSELoss # squares and square between the losses, the real value and predicted value Mean Criterion = nn.MSELoss () loss = criterion(sample, target) #1.5 # Nn.NLLLoss # negative log-likelihood function loss # Loss (X, class) the -X-= [class] # Loss (X, class) = -weights [class] * X [class] weights are designated m = NN. LogSoftmax () loss = nn.NLLLoss() # input is of size nBatch x nClasses = 3 x 5 input = torch.autograd.Variable(torch.randn(3, 5), requires_grad=True) # each element in target has to have 0 <= value < nclasses target = torch.autograd.Variable(torch.LongTensor([1, 0, 4])) output = loss(m(input), target) output.backward()
Optimization
SGD: stochastic gradient descent.
Small quantities of each iteration gradient is calculated, using the large amount of data in the data set
Momentum: Use SGD momentum.
Through the accumulation of momentum as the gradient use.
AdaGrad: learning rate constraints.
Suitable for handling sparse gradient, more sensitive to global learning rate parameter.
Adam: dynamically adjusting each parameter learning rate and second order moments of the first order is estimated by gradient.
After offset correction, each iteration the learning rate has a definite range, the value of the parameter is relatively stable.
Small memory requirements for large data sets and high-dimensional space, also applies to most non-convex optimization.
CODE
import torch.optim #使用 optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9) optimizer = optim.ada access ([var1, var2] s = 0.0001 ) # Base class torch.optim.Optimizer (params, Defaults) # params: Variable or dict of iterable, specify what parameters should be optimized # Defaults: optimization options include default values dictionary # Method: load_state_dict (state_dict) # loading state optimizer state_dict () # State: Optimization saved state dict, param_groups: all the parameters set zero_grad () # emptied optimized gradient of Variable optimizer.step () # single optimization optimizer .step (closure) # repeated a plurality of times calculation function, passing the closure, the closure needs to be emptied. for the INPUT, target in the DataSet: DEF closure (): optimizer.zero_grad() output = model(input) loss = loss_fn(output,target) loss.backward() return loss optimizer.step(closure) # The SGD torch.optim.SGD (the params, LR = 0.01, 0 = Momentum, Use the dampening = 0, 0 = weight_decay, Nesterov = False) # the params: parameters to be optimized # LR (a float): learning rate # Momentum (a float, can be option): momentum factor (default 0) # Use the dampening (a float, optional): momentum inhibitory factor (default 0) # weight_decay (a float, optional): damping weight (L2 penalty) (default 0) # Nesterov (bool, optional): Nesterov momentum (the default is False) # Adagrad torch.optim.Adagrad (the params, LR = 0.01, lr_decay = 0, weight_decay = 0) # lr_decay (a float, optional): learning rate decay (default 0) # Adam torch.optim.Adam (the params, LR = from 0.001, betas, = (0.9, 0.999), EPS = 1E-08, weight_decay = 0) # betas, (Tuple [a float, a float], optional): for calculating the gradient and a running average gradient squared coefficient (default: 0.9,0.999) # EPS (a float, optional): in order to increase the stability of the numerical entry is added to the denominator in (default: 1e-8)