[Deep Learning Experiment] Feedforward Neural Network (6): Automatic Derivation

Table of contents

1. Experiment introduction

 2. Experimental environment

1. Configure the virtual environment

2. Library version introduction

3. Experimental content

0. Import necessary toolkits

1. Scalar derivation

2. Matrix derivation

3. Calculation graph


1. Experiment introduction

       PyTorch provides an automatic derivation mechanism , which is one of the core functions of PyTorch and is used to calculate gradients and perform backpropagation. The automatic derivation mechanism makes gradient calculation in deep learning simpler and more efficient.

 2. Experimental environment

    This series of experiments uses the PyTorch deep learning framework. The relevant operations are as follows:

1. Configure the virtual environment

conda create -n DL python=3.7 
conda activate DL
pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
conda install matplotlib
 conda install scikit-learn

2. Library version introduction

software package This experimental version The latest version currently
matplotlib 3.5.3 3.8.0
numpy 1.21.6 1.26.0
python 3.7.16
scikit-learn 0.22.1 1.3.0
torch 1.8.1+cu102 2.0.1
torchaudio 0.8.1 2.0.2
torchvision 0.9.1+cu102 0.15.2

3. Experimental content

ChatGPT:

        Feedforward Neural Network is a common artificial neural network model, also known as Multilayer Perceptron (MLP). It is a model based on forward propagation and is mainly used to solve classification and regression problems.

        Feedforward neural network consists of multiple layers, including input layer, hidden layer and output layer. Its name "feedforward" comes from the fact that signals can only flow forward in the network, that is, from the input layer through the hidden layer and finally to the output layer, without feedback connections.

Here's how feedforward neural networks generally work:

  1. Input layer: receives raw data or feature vectors as input to the network, and each input is represented as a neuron of the network. Each neuron weights the input and transforms it through an activation function to produce an output signal.

  2. Hidden layer: A feedforward neural network can contain one or more hidden layers, each consisting of multiple neurons. The neurons in the hidden layer receive input from the previous layer and pass the weighted sum of the signal transformed by the activation function to the next layer.

  3. Output layer: The output of the last hidden layer is passed to the output layer, which usually consists of one or more neurons. The neurons in the output layer use appropriate activation functions (such as Sigmoid, Softmax, etc.) according to the type of problem to be solved (classification or regression) to output the final result.

  4. Forward propagation: The process of transmitting signals from the input layer through the hidden layer to the output layer is called forward propagation. During forward propagation, each neuron multiplies the output of the previous layer by the corresponding weight and passes the result to the next layer. Such calculations are performed layer by layer through each layer in the network until the final output is produced.

  5. Loss function and training: The training process of a feedforward neural network usually involves defining a loss function that measures the difference between the model's predicted output and the true label. Common loss functions include Mean Squared Error and Cross-Entropy. By using backpropagation and optimization algorithms (such as gradient descent), the network adjusts parameters according to the gradient of the loss function to minimize the value of the loss function.

        The advantages of feedforward neural networks include the ability to handle complex nonlinear relationships, their suitability for a variety of problem types, and their ability to automatically learn feature representations through training. However, it also has some challenges, such as easy over-fitting and difficulty in processing large-scale data and high-dimensional data. In order to cope with these challenges, some improved network structures and training techniques have been proposed, such as Convolutional Neural Networks and Recurrent Neural Networks.

This series is experimental content and does not explain theoretical knowledge in detail.

(Ahem, I actually don’t have time to sort it out. I’ll come back and fill in the gaps when I have the opportunity)

977468b5ae9843c6a88005e792817cb1.png

0. Import necessary toolkits

import torch

1. Scalar derivation

        Differentiating a function with only one output value results in a scalar value.

# 最简单的情况,X是一个标量
x = torch.tensor(2, dtype=torch.float32, requires_grad=True)
y = x ** 2 + 4 * x
print(x.grad)
y.backward()
print(x.grad)
  • Create a xtensor named with value 2, data type float32, and set requires_gradto Trueenable automatic differentiation.

  • y = x ** 2 + 4 * x: Defines a new tensor ywhose value is xsquared plus 4 multiplied by x.

  • Print the gradient value before calling backward()the function x. Since backpropagation has not yet been performed, x.gradthe value of is None.

  • By calling backward()the function, ygradients are calculated with respect to all tensors that require gradients. In this case, only xthe gradient is needed and x.gradwill therefore be calculated.

  • After calling backward()the function, print xthe gradient value, the derivative. Since yit is xa function with respect to , and we backward()backpropagated through the function, we x.gradwill now include the derivative values y​​with respect to .x

Output:

None
tensor(8.)

2. Matrix derivation

        Derives a function with multiple output values, resulting in a matrix or vector.

x = torch.ones(2, 2, requires_grad=True)
print(x.grad)
# y是一个矩阵
y = x ** 2 + 4 * x
y.backward(torch.ones(2, 2))
print(x.grad)
  • Create a 2x2 tensor xwith all elements having the value 1 and set requires_gradto Trueenable automatic differentiation.

  • Print the gradient value before calling backward()the function x. Since backpropagation has not yet been performed, x.gradthe value of is None.

  • Define a new tensor ywhose values ​​are xeach element squared plus 4 times xeach element. Since xthe shape of is 2x2, it ywill also have the same shape.

  • y.backward(torch.ones(2, 2)): backward()Computes ygradients relative to all tensors that require gradients by calling a function. In this case, only xthe gradient is needed, torch.ones(2, 2)which means initializing the gradient to a 2x2 matrix of all ones.

  • print(x.grad): The printed xgradient value, that is, the derivative. Since yit is xa function with respect to and we backward()are backpropagating through the function, we x.gradwill include the derivative value ywith respect tox

Output:

None
tensor([[6., 6.],
        [6., 6.]])

x = torch.ones(2, 2, requires_grad=True)
# y是一个矩阵
y = x ** 2 + 4 * x
y.backward(torch.ones(2, 2))
print(x.grad)

u = x ** 3 + 2 * x
# z是一个标量
z = u.sum()
z.backward()
print(x.grad)
  • u = x ** 3 + 2 * x: Defines a new tensor uwhose value is xthe cube of each element plus 2 multiplied by xeach element. Since xthe shape of is 2x2, it uwill also have the same shape.

  • z = u.sum():Define a new scalar zwhose value is uthe sum of all elements. sum()The function uadds all elements in to obtain a scalar value.

  • z.backward(): backward()Computes zgradients relative to all tensors that require gradients by calling a function. In this case, only xthe gradient is needed, as uis xthe function with respect to .

  • print(x.grad): The printed xgradient value, that is, the derivative. Since zit is xa function with respect to and we backward()are backpropagating through the function, the derivative values ​​with respect to x.gradwill be included .zx

Output:

tensor([[6., 6.],
        [6., 6.]])
tensor([[11., 11.],
        [11., 11.]])

3. Calculation graph

        A computation graph is a data structure used to represent the dependencies of mathematical operations. In deep learning, computational graphs are widely used for automatic derivation and backpropagation algorithms.

        Computational graphs are composed of nodes and edges. Nodes represent operations or variables, and edges represent dependencies between operations. In a computational graph, variables are often called leaf nodes or input nodes, and operations are called internal nodes or compute nodes.

        The construction process of computational graph includes the following steps:

  1. Define the input node (leaf node): convert the input data into a tensor and set its requires_gradproperties Trueto track gradients.
  2. Define compute nodes: Build compute nodes using mathematical operations between tensors (such as addition, multiplication, square, etc.).
  3. Build a computational graph: Connect input nodes and computational nodes to form a directed acyclic graph, which represents the dependencies between operations.
  4. Forward propagation: By calculating the path from the input node to the output node of the graph, mathematical operations are performed sequentially according to the dependencies to calculate the value of the output node.
  5. Backpropagation: Starting from the output node, the gradient of each node is calculated along the reverse path of the computational graph. According to the chain rule, the gradient of each node can be calculated from the gradient of subsequent nodes and the local gradient of this node.
  6. Gradient update: Use calculated gradient values ​​to update the parameters of the model for optimization and training.
import torch

x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
z = x**2 + y**3
z.backward()
print("Gradient of x:", x.grad)
print("Gradient of y:", y.grad)

Output:

Gradient of x: tensor(4.)
Gradient of y: tensor(27.)

Guess you like

Origin blog.csdn.net/m0_63834988/article/details/133130995