ch01-PyTorch basic concepts

0 Preface

Insert image description here

Insert image description here

1. Introduction to PyTorch

slightly.

Insert image description here

See watermark for image source.

2. Environment configuration

1) Verification successful

import torch
a=torch.ones(2,2)
a
tensor([[1., 1.],
        [1., 1.]])

View pytorch version

print ("hello pytorch {}".format(torch.__version__))

3) Check whether GPU is supported

print (torch.cuda.is_available())

3. Introduction and creation of tensors

3.1. Concept of tensor: multidimensional array

Insert image description here

Insert image description here

Tensor与Variable:

  • Reference 1 .

  • Reference 2

  • Variable: Mainly used to encapsulate Tensor and perform automatic derivation. It is the data type in torch.autograd. Variable is an important data structure before Pytorch version 0.4.0, but starting from 0.4.0, it has been incorporated into Tensor .

  • data: encapsulated Tensor

  • grad: gradient of data

  • grad_fn: Create a Tensor Function, which is the key to automatic derivation.

  • requires_grad: indicates whether gradient is required

  • is_leaf: indicates whether it is a leaf

Starting from Pytorch version 0.4.0, Variable is merged into Tensor:
Insert image description here

  • dtype: Tensor data type, three categories, 9 types in total. torch.FloatTensor, torch.cuda.FloatTensor
    Insert image description here

  • shape: The shape of the tensor. Such as: (64, 3, 224, 224)

  • decive: the device where it is located

3.2.Creation of tensor

3.2.1. Create directly
  • 1.torch.tensor()
torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)
功能:从data 创建 tensor

data : 数据 , 可以是 list, numpy
dtype : 数据类型,默认与 data 的一致
device 所在设备 , cuda cpu
requires_grad :是否需要梯度
pin_memory :是否存于锁页内存
import torch
import numpy as np

# Create tensors via torch.tensor

flag = True

if flag:
    arr = np.ones((3, 3))
    print("type of data:", arr.dtype)

    t = torch.tensor(arr, device='cuda')
    print(t)
type of data: float64
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], device='cuda:0', dtype=torch.float64)

Among them, cuda means that a GPU is used, and 0 is the label of the GPU. Since there is only one GPU, it is 0.

  • 2. Create tensor from numpy: torch.from_numpy(ndarray)
    Note: Shared content. The tensor created from torch.from_numpy shares memory with the original ndarray. When the data of one is modified, the other one will also be modified.
    Insert image description here
# Create tensors via torch.from_numpy(ndarray)
arr = np.array([[1, 2, 3], [4, 5, 6]])
t = torch.from_numpy(arr)
print("numpy array: ", arr)
print("tensor : ", t)

print("\n修改arr")
arr[0, 0] = 0
print("numpy array: ", arr)
print("tensor : ", t)

print("\n修改tensor")
t[0, 0] = -1
print("numpy array: ", arr)
print("tensor : ", t)
numpy array:  [[1 2 3]
 [4 5 6]]
tensor :  tensor([[1, 2, 3],
        [4, 5, 6]], dtype=torch.int32)

修改arr
numpy array:  [[0 2 3]
 [4 5 6]]
tensor :  tensor([[0, 2, 3],
        [4, 5, 6]], dtype=torch.int32)

修改tensor
numpy array:  [[-1  2  3]
 [ 4  5  6]]
tensor :  tensor([[-1,  2,  3],
        [ 4,  5,  6]], dtype=torch.int32)
3.2.2. Create based on numerical values
  • 1.torch.zeros(): Create an all-0 tensor according to size
    • Function: Create an all-0 tensor according to size
    • size: The shape of the tensor, such as (3,3), (3, 224, 224)
    • out: output tensor
    • layout layout form in memory, including strided (default), sparse_coo (this is usually set when sparse matrices are used to improve reading efficiency), etc.
    • The device where device is located, gpu cpu
    • requires_grad: whether gradient is required
torch.zeros(*size, out=None, dtype=None, 
	layout=torch.strided, device=None, requires_grad=False)

It can be seen that the value of out is the same as t, so out is the function of an output, assigning the data generated by the tensor to another variable.

  • 2.torch.zeros_like()
    • Function: Create an all-0 tensor based on the input shape
    • input: Create an all-0 tensor with the same shape as input
    • dtype: data type
    • layout in-memory layout form
torch.zeros_like(input, dtype=None, layout=None, device=None, requires_grad=False)
  • 3.torch.ones()
  • 4.torch.ones_like()
  • 5.torch.full()
  • 6.torch.full_like()
    • Function: Create a tensor of specified data based on the input shape
    • size: the shape of the tensor, such as (3,3)
    • fill_value : the value of the tensor
t = torch.full((3, 3), 5)
print(t)

tensor([[5, 5, 5],
        [5, 5, 5],
        [5, 5, 5]])
  • 7.torch.arange(), creates an arithmetic sequence, interval: [start, end)
    • Function: Create an arithmetic 1-dimensional tensor
    • Note: The numerical range is [start, end)
    • start: starting value of the sequence
    • end: the "end value" of the sequence
    • step: sequence tolerance, default is 1
t = torch.arange(start=0, end=100, step=1, out=None, dtype=None, 
	layout=torch.strided, device=None, requires_grad=False)
t = torch.arange(2, 10, 2)
print(t)
# tensor([2, 4, 6, 8])
  • 8.torch.linspace(), creates a mean score column, interval: [start, end]
    • Note: step is the step size; steps is the length
    • Function: Create an evenly divided 1-dimensional tensor
    • start: the starting value of the sequence, end: the ending value of the sequence, steps: the length of the sequence, note the length.
    • Its step size is (end - start)/steps.
t = torch.linspace(start=0, end=100, steps=5, out=None, dtype=None, 
	layout=torch.strided, device=None, requires_grad=False)
t = torch.linspace(2, 10, 6)
print(t)

# tensor([ 2.0000,  3.6000,  5.2000,  6.8000,  8.4000, 10.0000])
  • 9.torch.logspace(), creates a logarithmically divided 1-dimensional tensor
    • Note: the length of steps, the base is base, the default is 10
    • start: the starting value of the sequence, end: the end value of the sequence, steps: the length of the sequence, base: the base of the logarithmic function, the default is 10
t = torch.logspace(start=0, end=100, steps=5, base=10, out=None, 
	dtype=None, layout=torch.strided, device=None, requires_grad=False)
  • 10.torch.eye(), creates a unit diagonal matrix (2-dimensional tensor)
    • Note: The default is square matrix, n: number of matrix rows; m: matrix column
3.2.3. Create tensors based on probability distributions
  • 1.torch.normal(): Generate normal distribution (Gaussian distribution), mean: mean, std: standard deviation
    • Four modes: mean is a scalar, std is a scalar; mean is a scalar, std is a tensor; mean is a tensor, std is a scalar; mean is a tensor, std is a tensor. The latter three basic usages are the same, and they are all based on different dimensions.
torch.normal(mean, std, out=None)
torch.normal(mean, std, size, out=None)
# the mean and std both are tensors
mean = torch.arange(1, 5, dtype=torch.float)
std = torch.arange(1, 5, dtype=torch.float)
t_normal = torch.normal(mean, std)
print("mean:{}\nstd:{}".format(mean, std))
print(t_normal)
#######由结果可知,其生成的tensor是上面每一维度的参数生成的。
# mean:tensor([1., 2., 3., 4.])
# std:tensor([1., 2., 3., 4.])
# tensor([ 0.4750,  3.6384, -2.1488,  5.3180])

It should be noted that when mean and std are both scalars, the generated size needs to be specified.

# mean: scalar std: scalar
t_normal = torch.normal(0., 1., size=(4,))
print(t_normal)
# tensor([0.6614, 0.2669, 0.0617, 0.6213])
  • 2.torch.randn()
  • 3.torch.randn_like(), generates standard normal distribution
    • Note: size refers to the shape of the tensor
    • Function: Generate standard normal distribution (mean 0, variance 1)
    • size : The shape of the tensor.
torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
  • 4.torch.rand()
  • 5.torch.rand_like(), generates a uniform distribution on the interval [0, 1]
  • 6.torch.randint()
  • 7.torch.randint_like(), generates integer uniform distribution in [low, high)
    • Function: Interval [low, high) generates uniform distribution of integers
    • size : the shape of the tensor
  • 8.torch.randperm(), generates a random permutation from 0–n-1
    • Function: Generate random permutations from 0 to n-1
    • n : the length of the tensor
  • 9.torch.bernoulli(), generates Bernoulli distribution
    • Function: Using input as probability, generate Bernoulli distribution (0 1 distribution, two-point distribution)
    • input: probability value

4. Tensor operations and linear regression

4.1. Tensor operations: splicing, segmentation, indexing and transformation

4.1.1.Splicing
  • torch.cat(): Splice tensors according to dimension dim
    • Function: Splice tensors according to dimension dim
    • tensor: sequence of tensors
    • dim: splicing dimension
  • torch.stack(): splicing on the newly created dimension dim
    • Function: Splicing on the newly created dimension dim
    • tensor: tensor sequence
    • dim: Dimension to be spliced
t = torch.ones((2, 3))

t_0 = torch.cat([t, t], dim=0)
t_1 = torch.stack([t, t], dim=0)

print(t_0)
print(t_0.shape)
print(t_1)
print(t_1.shape)
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
torch.Size([4, 3])
tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 2, 3])

Compared with cat, stack is created in a new dimension

4.1.2. Segmentation
  • torch.chunk(input, chunks, dim): Split the tensor evenly by dimension dim
    • Function: Split the tensor evenly according to the dimension dim
    • Return value: list of tensors
    • Note: If it is not divisible, the last tensor is smaller than the other tensors.
    • input: tensor to be split
    • chunks: the number of portions to be divided into
    • dim: Dimension to be divided
    t = torch.ones((2, 7))
    print(t)
    
    list_of_tensor = torch.chunk(t, dim=1, chunks=3)
    print(list_of_tensor)
    
    tensor([[1., 1., 1., 1., 1., 1., 1.],
            [1., 1., 1., 1., 1., 1., 1.]])
    (tensor([[1., 1., 1.],
            [1., 1., 1.]]), 
    tensor([[1., 1., 1.],
            [1., 1., 1.]]),
    tensor([[1.], [1.]]))
    
  • torch.split(): Split the tensor according to dimension dim
    • Function: Split the tensor according to dimension dim
    • Return value: list of tensors
    • split_size_or_sections: When it is int, it indicates the length of each section; when it is a list, it is split according to the list elements.
    t = torch.ones((2, 7))
    print(t)
    list_of_tensor_2 = torch.split(t, 3, dim=1)
    print(list_of_tensor_2)
    
    list_of_tensor_3 = torch.split(t, [2, 2, 3], dim=1)
    print(list_of_tensor_3)
    
    tensor([[1., 1., 1., 1., 1., 1., 1.],
            [1., 1., 1., 1., 1., 1., 1.]])
    (tensor([[1., 1., 1.],
            [1., 1., 1.]]), tensor([[1., 1., 1.],
            [1., 1., 1.]]), tensor([[1.],
            [1.]]))
    (tensor([[1., 1.],
            [1., 1.]]), tensor([[1., 1.],
            [1., 1.]]), tensor([[1., 1., 1.],
            [1., 1., 1.]]))
    

The sum of the elements in the list is equal to the length of the dimension

4.1.3.Index
  • torch.index_select(): In the dimension dim, index data by index

    • Function: In the dimension dim, index data by index
    • Return value: tensor spliced ​​according to index index data
    t = torch.randint(0, 9, (3, 3))
    print(t)
    
    # index_select
    idx=torch.tensor([0,2],dtype=torch.long)
    t_index_select=torch.index_select(t,index=idx,dim=0)
    print(t_index_select)
    
    tensor([[7, 1, 5],
            [1, 3, 4],
            [3, 4, 0]])
    tensor([[7, 1, 5],
            [3, 4, 0]])
    
  • torch.masked_select(): Index according to True in mask and return a one-dimensional tensor.

    • Function: Index by True in mmask
    • Return value: one-dimensional tensor
    • input: the tensor to index
    • mask: Boolean type tensor with the same shape as input
    t = torch.randint(0, 9, (3, 3))
    print(t)
    
    # masked_select
    mask = t.ge(5)
    print(mask)
    
    t_masked_select = torch.masked_select(t, mask)
    print(t_masked_select)
    
    tensor([[3, 8, 1],
            [6, 4, 1],
            [4, 8, 2]])
    tensor([[False,  True, False],
            [ True, False, False],
            [False,  True, False]])
    tensor([8, 6, 8])
    
4.1.4.Transformation
  • orch.reshape()

    • Function: Transform the shape of the tensor
    • Note: When the tensor is contiguous in memory, the new tensor shares memory with the input
    • input: the tensor to be transformed
    • shape: the shape of the new tensor
    # torch.reshape
    t = torch.randperm(8)
    print(t)
    t_reshape = torch.reshape(t, (2, 4))  # -1代表不关心
    print(t_reshape)
    
    tensor([2, 3, 1, 4, 0, 5, 7, 6])
    tensor([[2, 3, 1, 4],
            [0, 5, 7, 6]])
    
  • torch.transpose(): Swap two dimensions of a tensor

    # torch.transpose
    t = torch.rand((2, 3, 4))
    print(t)
    t_transpose = torch.transpose(t, dim0=1, dim1=2)
    print(t_transpose)
    
    tensor([[[0.5063, 0.6772, 0.8968, 0.4836],
             [0.0820, 0.5198, 0.1273, 0.1895],
             [0.3535, 0.9936, 0.7150, 0.4375]],
    
            [[0.7801, 0.9114, 0.2901, 0.7171],
             [0.0553, 0.9102, 0.4060, 0.4010],
             [0.1037, 0.1053, 0.7860, 0.4523]]])
    tensor([[[0.5063, 0.0820, 0.3535],
             [0.6772, 0.5198, 0.9936],
             [0.8968, 0.1273, 0.7150],
             [0.4836, 0.1895, 0.4375]],
    
            [[0.7801, 0.0553, 0.1037],
             [0.9114, 0.9102, 0.1053],
             [0.2901, 0.4060, 0.7860],
             [0.7171, 0.4010, 0.4523]]])
    
  • torch.t(): 2-dimensional tensor transpose, for matrices, equivalent to torch.transpose(input, 0, 1)

  • torch.squeeze(): compress the dimension (axis) of length 1

    • Function: Compress dimensions (axis) with length 1
    • dim: If it is None, all axes with a length of 1 are removed; if the dimension is specified, the class is removed if and only if the axis length is 1.
    # torch.squeeze
    t=torch.rand((1,2,3,1))
    
    t1=torch.squeeze(t)
    print(t1.shape)
    
    t2=torch.squeeze(t,dim=2)
    print(t2.shape)
    
    torch.Size([2, 3])
    torch.Size([1, 2, 3, 1])
    
  • torch.unsqueeze(): expand dimensions according to dim

    • Function: expand dimensions based on dim

4.2. Mathematical operations on tensors

Insert image description here

Here we focus on demonstrating the addition function, because this function has a small detail: torch.add(input, alpha=1, other, out=None): calculate input+alpha * other element by element. Note that there is alpha here, which is called the multiplication factor. Something like weight . This thing makes calculations more concise. For example, in linear regression, we know that there is y = wx + b. Here, we can do it with just one line of code, torch.add(b, w, x).

  • torch.add(): Calculate input+alpha×other element by element

Insert image description here

# torch.add
t0=torch.rand((3,3))
t1=torch.ones_like(t0)
print(t0)
print(t1)
t_add=torch.add(t0,10,t1)
print(t_add)
t_0:
tensor([[ 0.6614,  0.2669,  0.0617],
        [ 0.6213, -0.4519, -0.1661],
        [-1.5228,  0.3817, -1.0276]])
t_1:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
t_add_10:
tensor([[10.6614, 10.2669, 10.0617],
        [10.6213,  9.5481,  9.8339],
        [ 8.4772, 10.3817,  8.9724]])

-torch.addcdiv()

  • torch.addcmul()
    Insert image description here

4.3. Linear regression

Linear regression is a method of analyzing the relationship between one variable and another (multiple) variables. The dependent variable is y, the independent variable is x, and the relationship is linear:

The task is to solve w, b.
Insert image description here

Insert image description here

Our solution steps:

  • 1. Determine the model: Model -> y = wx + b
  • 2. Select the loss function: MSE is used here:
  • 3. Solve the gradient and update w, b:

This is the idea called code logic that I mentioned above. When writing code, it is often customary to have such an idea first, and then it will be easier when writing code. And if you don’t learn Pytorch systematically, and start directly with complex CNN, LSTM, etc., the code logic is often difficult to form because we don’t know many details at all. So this study starts with the simplest linear regression, and then slowly moves to the more complex network. Next we start writing a linear regression model:

# -*- coding:utf-8 -*-
"""
@file name  : lesson-03-Linear-Regression.py
@author     : TingsongYu https://github.com/TingsongYu
@date       : 2018-10-15
@brief      : 一元线性回归模型
"""
import torch
import matplotlib.pyplot as plt
torch.manual_seed(10)

lr = 0.05  # 学习率    20191015修改

# 创建训练数据
x = torch.rand(20, 1) * 10  # x data (tensor), shape=(20, 1)
y = 2*x + (5 + torch.randn(20, 1))  # y data (tensor), shape=(20, 1)

# 构建线性回归参数
w = torch.randn((1), requires_grad=True)
b = torch.zeros((1), requires_grad=True)

for iteration in range(1000):

    # 前向传播
    wx = torch.mul(w, x)
    y_pred = torch.add(wx, b)

    # 计算 MSE loss
    loss = (0.5 * (y - y_pred) ** 2).mean()

    # 反向传播
    loss.backward()

    # 更新参数
    b.data.sub_(lr * b.grad)
    w.data.sub_(lr * w.grad)

    # 清零张量的梯度   20191015增加
    w.grad.zero_()
    b.grad.zero_()

    # 绘图
    if iteration % 20 == 0:

        plt.scatter(x.data.numpy(), y.data.numpy())
        plt.plot(x.data.numpy(), y_pred.data.numpy(), 'r-', lw=5)
        plt.text(2, 20, 'Loss=%.4f' % loss.data.numpy(), fontdict={
    
    'size': 20, 'color':  'red'})
        plt.xlim(1.5, 10)
        plt.ylim(8, 28)
        plt.title("Iteration: {}\nw: {} b: {}".format(iteration, w.data.numpy(), b.data.numpy()))
        plt.pause(1.5)

        if loss.data.numpy() < 1:
            break
          

Today's learning content is over. Let's briefly sort it out. In fact, there are quite a lot of small things.

  • First of all, we start with the most basic data structure of Pytorch and understand what a tensor is. To put it bluntly, it is a multi-dimensional array, and the tensor itself has many attributes, including data, dtype, shape, dtype, etc. about the data itself. Regarding requirements_grad, grad, grad_fn, is_leaf for derivation;
  • Then we learned how to create tensors, such as direct creation, creation from arrays, numerical creation, creation based on probability, etc. This involves a lot of creation functions tensor(), from_numpy(), ones(), zeros(), eye(), full(), arange(), linspace(), normal(), randn(), rand( ), randint(), randperm(), etc.;
  • Next is the tensor operation part, which includes basic operations and mathematical operations. The basic operation part includes two functions for splicing tensors (.cat, .stack), and two functions for splitting tensors (.chunk, .split). The transpose of the tensor (.reshape, .transpose, .t), and the two functions of the tensor index (.index_select, .masked_select). In the mathematical operation part, there are also many mathematical functions, including addition, subtraction, multiplication and division, exponential base power functions, and many trigonometric functions;
  • Finally, a simple linear regression was completed based on what was learned above.

This time I have sorted out a lot of functions. The usage of each function is different. You don’t need to remember the specific usage first. You need to know which functions specifically perform what function. When you use them, check them out and use them. You can become familiar with them with more practice. .

5. Computational graph and dynamic graph mechanism

5.1. Calculation graph

Deep learning is a series of operations on tensors. As the types and number of operations increase, various issues worth thinking about will arise. For example, whether multiple operations can be parallelized, how to collaborate with different underlying devices, and how to avoid redundant operations to achieve the most efficient computing efficiency while avoiding some bugs. Therefore, the computational graph (Computational Graph) was produced.

A computational graph is a directed acyclic graph used to describe operations. It has two main elements: nodes and edges.

  • Nodes represent data, such as vectors, matrices, and tensors.
  • Edges represent operations, such as addition, subtraction, multiplication, division, convolution, etc.

Represented by a calculation diagram: y=(x+w)*(w+1), as shown below:

  • a = x + w
  • b = w + 1
  • y = a * b

Insert image description here

It can be seen as, y = a × by=a \times by=a×b , in whicha = x + w , b = w + 1 a=x+w, b=w+1a=x+wb=w+1

Calculation graph and gradient derivation: Here the derivative of y with respect to w is found. The derivation rule of the root composite function can be obtained by the following process.

∂ y ∂ w = ∂ y ∂ a ∂ a ∂ w + ∂ y ∂ b ∂ b ∂ w = b ∗ 1 + a ∗ 1 = b + a = ( w + 1 ) + ( x + w ) = 2 ∗ w + x + 1 = 2 ∗ 1 + 2 + 1 = 5 \begin{aligned} \frac{\partial y}{\partial w} & =\frac{\partial y}{\partial a} \frac{\partial a}{\partial w}+\frac{\partial y}{\partial b} \frac{\partial b}{\partial w} \\ & =b * 1+a * 1 \\ & =b+a \\ & =(w+1)+(x+w) \\ & =2 * w+x+1 \\ & =2 * 1+2+1=5 \end{aligned} wy=aywa+bywb=b1+a1=b+a=(w+1)+(x+w)=2w+x+1=21+2+1=5
Insert image description here

Reflected in the calculation graph, there are two paths from the root node y to the leaf node w, y -> a -> w and y -> b -> w. The root node takes the derivatives of the child nodes of each path in turn, all the way to the leaf node w, and finally adds the derivatives of each path.

code show as below:

import torch
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)
# y=(x+w)*(w+1)
a = torch.add(w, x)     # retain_grad()
b = torch.add(w, 1)
y = torch.mul(a, b)
# y 求导
y.backward()
# 打印 w 的梯度,就是 y 对 w 的导数
print(w.grad)
# 结果为tensor([5.])

Looking back at the Tensor mentioned earlier, there is an attribute is_leaf that marks whether it is a leaf node.
Insert image description here

In the above example, x and w are leaf nodes, and all other nodes depend on leaf nodes. The concept of leaf nodes is mainly to save memory. After a round of backpropagation in the calculation graph is completed, the gradients of non-leaf nodes will be released.

  • Leaf node: The node created by the user is called a leaf node, such as X and W

  • is_leaf: Indicates whether the tensor is a leaf node

  • The function of the leaf node is to mark and store the gradient of the leaf node, and to clear the gradient of the variable during the backpropagation process to save memory. Of course, if you want to save the gradient value of the variable in the process, you can use retain_grad()

  • grad_fn: records the method (function) used to create the tensor

Code example:

# 查看叶子结点
print("is_leaf:\n", w.is_leaf, x.is_leaf, a.is_leaf, b.is_leaf, y.is_leaf)

# 查看梯度
print("gradient:\n", w.grad, x.grad, a.grad, b.grad, y.grad)

The result is:

is_leaf:
 True True False False False
gradient:
 tensor([5.]) tensor([2.]) None None None

The gradient of non-leaf nodes is empty. If you still need to retain the gradient of non-leaf nodes after the backpropagation is completed, you can use the retain_grad() method on the node.

The grad_fn attribute in Tensor records the method (function) used to create the tensor. This property is needed when backpropagating the gradient.

Sample code:

# 查看梯度
print("w.grad_fn = ", w.grad_fn)
print("x.grad_fn = ", x.grad_fn)
print("a.grad_fn = ", a.grad_fn)
print("b.grad_fn = ", b.grad_fn)
print("y.grad_fn = ", y.grad_fn)

The result is

w.grad_fn =  None
x.grad_fn =  None
a.grad_fn =  <AddBackward0 object at 0x000001D8DDD20588>
b.grad_fn =  <AddBackward0 object at 0x000001D8DDD20588>
y.grad_fn =  <MulBackward0 object at 0x000001D8DDD20588>

5.2. PyTorch’s dynamic graph mechanism

According to the way the calculation graph is built, the calculation graph can be divided into dynamic graphs and static graphs.

Dynamic graphics vs static graphics:

Insert image description here

  • dynamic picture:
    • Computation and construction are performed simultaneously
    • Flexible and easy to adjust

For example, dynamic graph PyTorch:

Please add image description

  • static
    • Build the graph first, then do the calculations
    • Efficient but not flexible.

Static graph TensorFlow:

Please add image description

PyTorch uses a dynamic graph mechanism (Dynamic Computational Graph), while Tensorflow uses a static graph mechanism (Static Computational Graph).

Dynamic graphs are calculated and constructed at the same time, that is, the values ​​of the previous nodes can be calculated first, and then the subsequent calculation graph can be built based on these values. The advantages are flexibility, easy adjustment, and easy debugging. Many writing methods in PyTorch are completely consistent with the usage of codes in other Python libraries, without any additional learning costs.

For static graphs, the graph is built first, and then the data is input for calculation. The advantage is high efficiency, because static calculation is defined first and then run. There is no need to rebuild the calculation graph when running again, so the speed will be faster than dynamic graphs. But not flexible. The graph is the same every time TensorFlow runs and cannot be changed, so you cannot directly use Python's while loop statement. You need to use the auxiliary function tf.while_loop to write it in the internal form of TensorFlow.

6.autograd and logistic regression

This lesson is mainly divided into two parts: the automatic derivation system in PyTorch and the logistic regression model. We know that the training of deep models is to continuously update the weights, and the update of the weights requires solving the gradient. Therefore, the gradient is crucial in our model training process. However, solving the gradient is often tedious, so an automatic derivation system has been introduced in PyTorch to help us with this process. In PyTorch, we do not need to manually calculate the gradient. We only need to build the calculation graph for forward propagation, and then use the autograd method in PyTorch to get the gradients of all tensors.

6.1.autograd automatic derivation system

  • torch.autograd.backward()
    • Function: Automatically find the gradient of each node in the calculation graph.
torch.autograd.backward(
    tensors,
    grad_tensors=None,
    retain_graph=None,
    create_graph=False
)

The main parameters:

  • tensors: Tensors used for derivation, such as loss.
  • retain_graph: Save the calculation graph. PyTorch discards the calculation graph after backpropagation is completed by default. If you need to save it, set this item to True.
  • create_graph: Create a derivative calculation graph for high-order derivation.
  • grad_tensors: Multi-gradient weights. When we have multiple losses and need to calculate gradients, we need to set the weight ratio of each loss.

Let’s review how to solve for gradients via computational graphs:

y = ( x + w ) ∗ ( w + 1 ) y=(x+w) *(w+1) y=(x+w)(w+1)

  • a = x + w a=x+w a=x+w
  • b = w + 1 b=w+1 b=w+1
  • y = a ∗ b y=a * b y=ab
    ∂ y ∂ w = ∂ y ∂ a ∂ a ∂ w + ∂ y ∂ b ∂ b ∂ w = b ∗ 1 + a ∗ 1 = ( w + 1 ) + ( x + w ) = 2 ∗ w + x + 1 = 2 ∗ 1 + 2 + 1 = 5 \begin{aligned} \frac{\partial y}{\partial w} & =\frac{\partial y}{\partial a} \frac{\partial a}{\partial w}+\frac{\partial y}{\partial b} \frac{\partial b}{\partial w} \\ & =b * 1+a * 1 \\ & =(w+1)+(x+w) \\ & =2 * w+x+1 \\ & =2 * 1+2+1=5 \end{aligned} wy=aywa+bywb=b1+a1=(w+1)+(x+w)=2w+x+1=21+2+1=5

Code example:

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)
y = torch.mul(a, b)

# 如果希望后面再次执行该计算图,可以将 retain_graph 参数设为 True
# y.backward(retain_graph=True) 

y.backward()
print(w.grad)

Output result:

tensor([5.])

When there are multiple losses that need to calculate gradients, set the weight ratio of each loss through grad_tensors:

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)

# y0 = (x+w) * (w+1)    dy0/dw = 2*w + x + 1 = 5
y0 = torch.mul(a, b)

# y1 = (x+w) + (w+1)    dy1/dw = 2
y1 = torch.add(a, b)  

# 这种情况下,loss 是一个向量 [y0, y1]
loss = torch.cat([y0, y1], dim=0)

# 梯度的权重:dy0/dw 权重为 1,dy1/dw 权重为 2
grad_tensors = torch.tensor([1., 2.])

# gradient 传入 torch.autograd.backward() 中的 grad_tensors
loss.backward(gradient=grad_tensors)  

print(w.grad) # 5*1 + 2*2 = 9

Output result:

tensor([9.])
  • torch.autograd.grad()
    • Function: Find the gradient.
torch.autograd.grad(
    outputs,
    inputs,
    grad_outputs=None,
    retain_graph=None,
    create_graph=False
)

The main parameters:

  • outputs: Tensor used for derivation, such as loss.
  • inputs: Tensors requiring gradients.
  • create_graph: Create a derivative calculation graph for high-order derivation.
  • retain_graph: Save the calculation graph.
  • grad_outputs: multi-gradient weights.

Find the second-order gradient:

x = torch.tensor([3.], requires_grad=True)
y = torch.pow(x, 2)  # y = x**2

# grad_1 = dy/dx = 2x = 2 * 3 = 6
grad_1 = torch.autograd.grad(y, x, create_graph=True)  
print(grad_1)

# grad_2 = d(dy/dx)/dx = d(2x)/dx = 2
grad_2 = torch.autograd.grad(grad_1[0], x)  
print(grad_2)

Output result:

(tensor([6.], grad_fn=<MulBackward0>),)
(tensor([2.]),)

Precautions:

  • Gradient is not automatically cleared.
  • For nodes that depend on leaf nodes, requires_grad defaults to True.
  • Leaf nodes cannot perform in-place operations.

Code example 1:

# 1. 梯度不会自动清零,重复求取会叠加,可以使用 .grad.zero_() 方法手动清零
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

for i in range(3):
    a = torch.add(w, x)
    b = torch.add(w, 1)
    y = torch.mul(a, b)

    y.backward()
    print(w.grad)

# 梯度清零,下划线表示原位操作 (in-place)
w.grad.zero_()

for i in range(3):
    a = torch.add(w, x)
    b = torch.add(w, 1)
    y = torch.mul(a, b)

    y.backward()
    print(w.grad)
    w.grad.zero_()

Output result:

tensor([5.])
tensor([10.])
tensor([15.])
tensor([5.])
tensor([5.])
tensor([5.])

Code example 2:

# 2. 依赖于叶子结点的结点, requires_grad 默认为 True
w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)
y = torch.mul(a, b)

print(a.requires_grad, b.requires_grad, y.requires_grad)

Output result:

True True True

Code example 3:

# 3. 叶子结点不可执行 in-place (原位操作)。因为 PyTorch 计算图中引用叶子结点的值是
#    直接引用其前向传播时的地址,为了防止计算出错,叶子结点不可执行 in-place 操作。

#    in-place (原位操作): 从原始内存地址中直接改变数据。
#    非 in-place 操作: 开辟一块新的内存地址存储改变后的数据。

a = torch.ones((1, ))
print(id(a), a)

# 非 in-place 操作
a = a + torch.ones((1, ))
print(id(a), a)

# in-place 操作
a += torch.ones((1, ))
print(id(a), a)

Output result:

4875211904 tensor([1.])
4875212336 tensor([2.])
4875212336 tensor([3.])

Performing in-place operations on leaf nodes will result in an error:

w = torch.tensor([1.], requires_grad=True)
x = torch.tensor([2.], requires_grad=True)

a = torch.add(w, x)
b = torch.add(w, 1)
y = torch.mul(a, b)

# 对非叶子结点 a 执行非 in-place 操作
print(a.add(1))

# 对非叶子结点 a 执行 in-place 操作
print(a.add_(1))

# 对叶子结点 w 执行非 in-place 操作
print(w.add(1))

# 对叶子结点 w 执行 in-place 操作,会报错
print(w.add_(1))

y.backward()

Output result:

tensor([4.], grad_fn=<AddBackward0>)
tensor([4.], grad_fn=<AddBackward0>)
tensor([2.], grad_fn=<AddBackward0>)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/andy/PycharmProjects/hello_pytorch/lesson/lesson-05/lesson-05-autograd.py", line 145, in <module>
    print(w.add_(1))
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

6.2. Logistic regression

Logistic Regression is a linear binary classification model.
Model expression:
y = f ( WX + b ) f ( x ) = 1 1 + e − x \begin{aligned} & y=f(W X+b) \\ & f(x)=\frac{1 }{1+e^{-x}} \end{aligned}y=f(WX+b)f(x)=1+ex1
即:
y = 1 1 + e − ( W X + b ) y=\frac{1}{1+e^{-(W X+b)}} y=1+e(WX+b)1
Here, we will f ( x ) f(x)f ( _ _ _
 _ \geq 0.5\end{cases} class ={ 0,1,y<0.5y0.5

Insert image description here

Linear regression analyzes the independent variable xxx and dependent variableyyy (scalar); while logistic regression analyzes the independent variablesxxx and dependent variableyyy (probability).

Insert image description here

5 steps for machine learning training:

Insert image description here

  • 1. Data: data collection, cleaning, partitioning, and preprocessing.
  • 2. Model: Depending on the difficulty of the task, choose a simple linear model or a complex neural network model, etc.
  • 3. Loss function: Choose different loss functions according to different tasks and calculate their gradients. For example: in linear regression, we can choose the mean square error loss function; in classification tasks, we can choose the cross-entropy loss function.
  • 4. Optimizer: After getting the gradient, we choose some kind of optimizer to update the weights.
  • 5. Iterative training: After having the data, model, loss function and optimizer, we can perform iterative training.

Code example:

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
torch.manual_seed(10)

# ============================== Step 1/5: 生成数据 ===================================
sample_nums = 100
mean_value = 1.7
bias = 1
n_data = torch.ones(sample_nums, 2)
x0 = torch.normal(mean_value * n_data, 1) + bias    # 类别0 数据 shape=(100, 2)
y0 = torch.zeros(sample_nums)                       # 类别0 标签 shape=(100, 1)
x1 = torch.normal(-mean_value * n_data, 1) + bias   # 类别1 数据 shape=(100, 2)
y1 = torch.ones(sample_nums)                        # 类别1 标签 shape=(100, 1)
train_x = torch.cat((x0, x1), 0)
train_y = torch.cat((y0, y1), 0)


# ============================== Step 2/5: 选择模型 ===================================
class LR(nn.Module):
    def __init__(self):
        super(LR, self).__init__()
        self.features = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.features(x)
        x = self.sigmoid(x)
        return x


lr_net = LR()   # 实例化逻辑回归模型

# ============================== Step 3/5: 选择损失函数 ================================
loss_fn = nn.BCELoss()  # 二分类交叉熵损失 Binary Cross Entropy Loss

# ============================== Step 4/5: 选择优化器 ==================================
lr = 0.01   # 学习率
optimizer = torch.optim.SGD(lr_net.parameters(), lr=lr, momentum=0.9)    # 随机梯度下降

# ============================== Step 5/5: 模型训练 ====================================
for iteration in range(1000):

    # 前向传播
    y_pred = lr_net(train_x)

    # 计算 loss
    loss = loss_fn(y_pred.squeeze(), train_y)

    # 反向传播
    loss.backward()

    # 更新参数
    optimizer.step()

    # 绘图
    if iteration % 20 == 0:

        mask = y_pred.ge(0.5).float().squeeze()  # 以 0.5 为阈值进行分类
        correct = (mask == train_y).sum()   # 计算正确预测的样本个数
        acc = correct.item() / train_y.size(0)   # 计算分类准确率

        plt.scatter(x0.data.numpy()[:, 0], x0.data.numpy()[:, 1], c='r', label='class 0')
        plt.scatter(x1.data.numpy()[:, 0], x1.data.numpy()[:, 1], c='b', label='class 1')

        w0, w1 = lr_net.features.weight[0]
        w0, w1 = float(w0.item()), float(w1.item())
        plot_b = float(lr_net.features.bias[0].item())
        plot_x = np.arange(-6, 6, 0.1)
        plot_y = (-w0 * plot_x - plot_b) / w1

        plt.xlim(-5, 7)
        plt.ylim(-7, 7)
        plt.plot(plot_x, plot_y)

        plt.text(-5, 5, 'Loss=%.4f' % loss.data.numpy(), fontdict={
    
    'size': 20, 'color': 'red'})
        plt.title("Iteration: {}\nw0:{:.2f} w1:{:.2f} b:{:.2f} accuracy:{:.2%}".format(iteration, w0, w1, plot_b, acc))
        plt.legend()

        plt.show()
        plt.pause(0.5)

        if acc > 0.99:
            break

6.3. Summary

torch.autograd.backwardThis lesson introduces the two common methods of and in the PyTorch automatic derivation system torch.autograd.grad, and demonstrates the derivation process of first-order and second-order derivatives; understands the automatic derivation system, and the data carrier - tensor, forward Propagation constructs a calculation graph, and the calculation graph obtains the gradient process. With this knowledge in hand, we can begin to formally train the machine learning model. Here, by demonstrating the training of the logistic regression model, we learn the five major modules of the machine learning regression model: data, model, loss function, optimizer and iterative training process. These five modules will be the main line of study later.

Insert image description here

Guess you like

Origin blog.csdn.net/fb_941219/article/details/129895471