Detailed explanation of Pytorch automatic derivation mechanism

Table of contents

1. Automatic derivation

1.1 Gradient calculation

1.1.1 First derivative

 1.1.2 Second derivative

 1.1.3 Vectors

 1.2 Linear regression in practice


1. Automatic derivation

In deep learning, we usually need to train a model to minimize a loss function. This process can be achieved by optimization algorithms such as gradient descent. The gradient is the rate of change of a function at a certain point and can tell us how to adjust the parameters of the model to minimize the loss function. Automatic derivation is a technique for calculating gradients that allows us to define models without manually deriving gradient calculation formulas. PyTorch provides the function of automatic derivation, which makes the calculation of gradient very simple and efficient.

PyTorch is a dynamic graph, that is, the construction and operation of the calculation graph are simultaneous, and the results can be output at any time. There are only two elements in the calculation graph of pytorch: data (tensor) and operation (operation).

Operations include: addition, subtraction, multiplication and division, square root, exponent pair, trigonometric functions and other derivative operations.

The data can be divided into: leaf nodes (leaf nodes) and non-leaf nodes ; leaf nodes are nodes created by users and do not depend on other nodes; the difference between them is that after the backpropagation ends, the gradient of non-leaf nodes will be released , only retaining the gradient of the leaf nodes, which saves memory. If you want to preserve the gradient of non-leaf nodes, you can use retain_grad()the method.

torch.tensor has the following properties:

  • Check if it is possible to derive requires_grad
  • View operation name grad_fn
  • Check if it is a leaf node is_leaf
  • View derivative values grad

For the requires_grad attribute, self-defined leaf nodes default to False, non-leaf nodes default to True, and weights in the neural network default to True. One of the principles for judging which nodes are True/False is that there is a path that can be derived from the leaf node you need to derive to the loss node. When we want to Tensorfind the gradient of a variable, we need to specify requires_gradthe attribute first True. There are two main ways to specify:

x = torch.tensor(1.).requires_grad_() # 第一种

x = torch.tensor(1., requires_grad=True) # 第二种

Summarize: 

(1) torch.tensor() sets the requires_grad keyword parameter

(2) Check whether the tensor can be guided, x.requires_grad attribute

(3) Set the derivability of the leaf variable leaf variable, x.requires_grad_() method

(4) The automatic derivation method y.backward(), directly calling the backward() method, will only calculate the derivative to the leaf nodes of the calculation graph.

(5) View the obtained value, x.grad attribute

1.1 Gradient calculation

The core of automatic derivation is the backpropagation algorithm (Backpropagation). The backpropagation algorithm is an efficient way to compute gradients by using the chain rule to compute gradients for each differentiable operation, and then using these gradients to update parameters. Once we create derivable tensors, PyTorch will automatically track all operations involving these tensors and build a computational graph. A computation graph is a directed acyclic graph that represents dependencies between tensors during computation.

1.1.1 First derivative

Then we give an example: z=w*x+b

import torch

x=torch.tensor(1.,requires_grad=True)
b=torch.tensor(2.,requires_grad=True)
w=torch.tensor(3.,requires_grad=True)
z=w*x+b
z.backward()#反向传播
print(x.grad)#x导数值
print(w.grad)#w导数值
print(b.grad)#b导数值

The result of the operation is as follows:

In order to make the above x, b, and w support derivation, they must be of floating point type, that is, when we give the initial value, we need to add a dot: ".". Otherwise, an error will be reported. 

 1.1.2 Second derivative

import torch

x = torch.tensor(2.).requires_grad_()
y = torch.tensor(3.).requires_grad_()
z = x * x * y
z.backward(create_graph=True) # x.grad = 12
print(x.grad)
x.grad.data.zero_() #PyTorch使用backward()时默认会累加梯度,需要手动把前一次的梯度清零
x.grad.backward() #对x一次求导后为2xy,然后再次反向传播
print(x.grad)

The result of the operation is as follows:

 1.1.3 Vectors

In pytorch , the default: it can only be derived from [scalar] to [scalar], or [scalar] to [quantity/matrix]

In deep learning, when deriving, the loss function is derived. The loss function is generally a scalar, and the parameters are often vectors or matrices.

For example, there is an input layer whose input layer is 3 nodes, and the output layer is an output layer of one node. Such a simple neural network, for a set of samples, has

X=(x1,x2,x3)=(1.5,2.5,3.5), X is (1,3) dimensional, and the weight matrix of the output layer is W=(w1,w2,w3)W=(0.2,0.4 ,0.6)T, here represents the initialized weight matrix, T represents the transpose, then W represents the (3,1) dimension, the bias item is b=0.1, which is a scalar, then a model can be constructed as follows:

Y=XW+b, where W and b are the variables that require the reciprocal, where Y is a scalar, W is a vector, b is a scalar, and W and b are leaf nodes.

Expand the above to get:

Y=x1*w1+x2*w2*x3*w3+b   

import torch

# 创建一个多元函数,即Y=XW+b=Y=x1*w1+x2*w2*x3*w3+b,x不可求导,W,b设置可求导
X = torch.tensor([1.5, 2.5, 3.5], requires_grad=False)
W = torch.tensor([0.2, 0.4, 0.6], requires_grad=True)
b = torch.tensor(0.1, requires_grad=True)
Y = torch.add(torch.dot(X, W), b)


# 求导,通过backward函数来实现
Y.backward()

# 查看导数,也即所谓的梯度
print(W.grad)
print(b.grad)

The running screenshot is as follows:

 1.2 Linear regression in practice

Define a y=2*x+1 linear equation. The following is an example of implementing a linear regression model using PyTorch and using automatic derivation to train the model:


import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim

x_values=[i for i in range(11)]
x_train=np.array(x_values,dtype=np.float32)
x_train=x_train.reshape(-1,1)


y_values=[2*i +1 for i in x_values]
y_values=np.array(y_values,dtype=np.float32)
y_train=y_values.reshape(-1,1)


#这里线性回归就相当于不加激活函数的全连接层
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)



#使用GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 创建模型实例和优化器
model = LinearRegression()
model.to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)

# 定义损失函数
criterion = nn.MSELoss()
for epoch in range(100):
    # 创建数据集
    inputs = torch.from_numpy(x_train).to(device)
    targets = torch.from_numpy(y_train).to(device)
    # 前向传播
    outputs = model(inputs)
    loss = criterion(outputs, targets)

    # 反向传播和优化器更新
    #梯度清零每一次迭代
    optimizer.zero_grad()
    #反向传播
    loss.backward()
    #更新权重参数
    optimizer.step()
    #每10轮,打印一下损失函数
    if epoch%10==0:
        print("epoch {}, loss {}".format(epoch,loss.item()))


#使用训练完的模型进行数据的预测
predicted=model(torch.from_numpy(x_train).to(device))
print(predicted)
print(targets)

In the above example, we first created a simple linear regression model LinearRegressionand created a dataset with 11 samples. Then, we define the loss function criterionand optimizer optimizer, and perform model training in the training loop.

The loss value changes during model training as follows:

 The prediction results and label values ​​in the model are compared in the following figure: the upper one is the model prediction result, and the lower one is the label value

 So far this article is over.

Guess you like

Origin blog.csdn.net/qq_43649937/article/details/131783905