[Reading notes] algorithm practice recommendation system (a)

The focus here is to do a reading of collaborative filtering algorithms, and more important to me some knowledge.

There pytorch before in some of the function description. .

 


 

Loss function

Mean square errors

The least squares method: linear regression method. Should all fit curve and the minimum distance between the point of the regression line.

General use Euclidean distance metric.

$L(Y|f(X)) = \sum\limits_{N}(Y-f(X))^{2}$

 

log logarithmic loss function: logistic regression loss function is logarithmic loss function.

Sample obey the Bernoulli distribution is assumed, and then find the likelihood function, then the number of extremum.

$L(Y|f(X)) = -logP(Y|X)$

 

Exponential loss function: AdaBoost is exponential loss function.

$L(Y|f(X)) = exp[-yf(x)]$

 

CODE

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

sample = Variable(torch.ones(2,2))
a=torch.Tensor(2,2)
a[0,0]=0
a[0,1]=1
a[1,0]=2
a[1,1]=3
target = Variable (a)
#sample: [[1,1],[1,1]]
#target: [[0,1],[2,3]]


# Basic usage 
Criterion LossCriterion = () # constructor has its own parameters 
Loss = Criterion (X, Y)       # when calling standard has parameters 
# calculated results have been taken to an average mini-batch


# L1Loss 
# average absolute error of the predicted value and the true value of 
Criterion = nn.L1Loss ()
loss = criterion(sample, target)
#1


# SmoothL1Loss 
# HuberLoss, error in (-1,1) squaring loss, L1 other losses 
Criterion = nn.SmoothL1Loss ()
loss = criterion(sample, target)
#0.625


# MSELoss 
# squares and square between the losses, the real value and predicted value Mean 
Criterion = nn.MSELoss ()
loss = criterion(sample, target)
#1.5


# Nn.NLLLoss 
# negative log-likelihood function loss 
# Loss (X, class) the -X-= [class] 
# Loss (X, class) = -weights [class] * X [class] weights are designated 

m = NN. LogSoftmax ()
loss = nn.NLLLoss()
# input is of size nBatch x nClasses = 3 x 5
input = torch.autograd.Variable(torch.randn(3, 5), requires_grad=True)
# each element in target has to have 0 <= value < nclasses
target = torch.autograd.Variable(torch.LongTensor([1, 0, 4]))
output = loss(m(input), target)
output.backward()

 

 


 

Optimization

SGD: stochastic gradient descent.

Small quantities of each iteration gradient is calculated, using the large amount of data in the data set

 

 

 

Momentum: Use SGD momentum.

Through the accumulation of momentum as the gradient use.

 

 

AdaGrad: learning rate constraints.

Suitable for handling sparse gradient, more sensitive to global learning rate parameter.

 

 

Adam: dynamically adjusting each parameter learning rate and second order moments of the first order is estimated by gradient.

After offset correction, each iteration the learning rate has a definite range, the value of the parameter is relatively stable.

Small memory requirements for large data sets and high-dimensional space, also applies to most non-convex optimization.

 

 

CODE

import torch.optim

#使用
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
optimizer = optim.ada access ([var1, var2] s = 0.0001 )

# Base class 
torch.optim.Optimizer (params, Defaults)
 # params: Variable or dict of iterable, specify what parameters should be optimized 
# Defaults: optimization options include default values dictionary

# Method: 
load_state_dict (state_dict) # loading state optimizer 
state_dict ()                 # State: Optimization saved state dict, param_groups: all the parameters set 
zero_grad ()                  # emptied optimized gradient of Variable 
optimizer.step ()             # single optimization 
optimizer .step (closure)      # repeated a plurality of times calculation function, passing the closure, the closure needs to be emptied. 
for the INPUT, target in the DataSet:
     DEF closure ():
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output,target)
        loss.backward()
        return loss
    optimizer.step(closure)



# The SGD 
torch.optim.SGD (the params, LR = 0.01, 0 = Momentum, Use the dampening = 0, 0 = weight_decay, Nesterov = False)
 # the params: parameters to be optimized 
# LR (a float): learning rate 
# Momentum (a float, can be option): momentum factor (default 0) 
# Use the dampening (a float, optional): momentum inhibitory factor (default 0) 
# weight_decay (a float, optional): damping weight (L2 penalty) (default 0) 
# Nesterov (bool, optional): Nesterov momentum (the default is False)

# Adagrad 
torch.optim.Adagrad (the params, LR = 0.01, lr_decay = 0, weight_decay = 0)
 # lr_decay (a float, optional): learning rate decay (default 0)

# Adam 
torch.optim.Adam (the params, LR = from 0.001, betas, = (0.9, 0.999), EPS = 1E-08, weight_decay = 0)
 # betas, (Tuple [a float, a float], optional): for calculating the gradient and a running average gradient squared coefficient (default: 0.9,0.999) 
# EPS (a float, optional): in order to increase the stability of the numerical entry is added to the denominator in (default: 1e-8)

 

Guess you like

Origin www.cnblogs.com/Asumi/p/12470611.html
Recommended