reference:
Quickstart — PyTorch Tutorials 2.0.1+cu117 documentation (2 messages) pytorch basics - optimizing model parameters (6)_torch.optim.sgd(model.parameters(), lr=learning_ra_A little prairie dog's blog -CSDN Blog
1. Tensor tensor
-Data structure (similar to array matrix) - use tensors to encode the input, output, and parameters of the model
-last column-1
- concatenate tensors, .cat
dim refers to the dimension, without square brackets, dim =0; 1 square bracket, dim =1;
-matplotlib is a 2D plotting library for Python
2. Data sets and data loaders
figure: feature; Label: label;
&Custom data set: three functions
_init_: Run once when instantiating the dataset object, initializing the image, annotation file, and two transformations (transform, target_transform)
_len_: Returns the number of samples in the data set
_getitem_: From the given index, convert the image label to a tensor and return the tensor image and corresponding label
3. Transformation
transform和target_transform
Features as normalized tensors and labels as one-hot encoded tensors, using .ToTensor和
Lambda进行转换
transform=ToTensor(),
target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
&ToTensor converts a PIL image or NumPy to a. Such situation, and the pixel intensity value of the image is in the range [0., 1.]
&Lambda conversion: is a Lambda function that converts integers into one-hot encoded tensors.
The natural status code is: 000,001,010,011,100,101.
The one-hot encoding is: 000001,000010,000100,001000,010000,100000
target_transform = Lambda(lambda y: torch.zeros(
10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))
首先创建一个大小为 10(我们数据集中的标签数量)的零张量,并调用 scatter_ 在标签给出的索引上分配一个
scatter_(input, dim, index, src): Fill the data in src into the input in the direction of dim according to the index in index. It can be understood as placing elements or modifying elements
- dim: along which dimension to index
- index: element index used for scatter
- src: the source element used for scatter, which can be a scalar or a tensor
4. Build the model
- Define the neural network by subclassing nn.Module, use __init__ to initialize the neural network layer, and each nn.Module subclass implements the operation of the input data in the forward method
The -Flatten layer is used to "flatten" the input, that is, to convert the multi-dimensional input into one dimension. It is often used in the transition from the convolutional layer to the fully connected layer . (3, 32, 64) is a three-dimensional data, with a total of 3*32*64=6144 elements. Pull this three-dimensional data into a line, and the length of the line is 6144.
-nn.Sequential
Class is torch.nn
a kind of sequence container in the container. By nesting various classes related to the specific functions of the neural network in the container, the construction of the neural network model is completed; the content in the brackets of this class is the neural network model we built. specific structure
-nn.Linear: used to define the linear layer of the model and complete the linear transformation mentioned above. The parameters are (number of input features, number of output features, whether to use bias (default is true)), and the weights of the corresponding dimensions will be automatically generated. Parameters and biases
-nn.ReLU
The class belongs to the non-linear activation classification and does not require parameters to be passed by default when defining.
-logits is a vector that is usually thrown to softmax in the next step . softmax normalized exponential function
-The function of torch.rand in layman’s terms is to generate uniformly distributed data. Enter a few numbers in the brackets of torch.rand() to generate a tensor of several dimensions.
x = torch.rand(3,4): 2-dimensional tensor, three rows and four columns
It is relatively easy to understand three-dimensional tensors. A two-dimensional tensor can be regarded as a plane, while a three-dimensional tensor can be regarded as many two-dimensional tensor planes placed in parallel.
For example, our common RGB image can be understood as three two-dimensional grayscale images placed side by side.
5. Autograd automatic derivation
Gradient: the derivative of the loss function with respect to the parameters
Backpropagation algorithm: The parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameters.
torch.autograd supports automatic calculation of gradients for any computational graph .
6. Optimization optimization parameters
-During each iteration of training the model, the model makes a guess at the output, calculates the error between the guess and the actual label, collects the derivatives of the error with respect to its parameters, and optimizes these parameters using gradient descent
-Control the model optimization process by adjusting hyperparameters. Different hyperparameter values will affect model training and convergence speed
-Loss: Make a prediction using the input of a given data sample and compare it with the true data label value.
-SGD optimizer
-Training loop, three steps for optimization
·Call optimizer.zero_grad() to reset the gradient of the model parameters. The gradients are summed by default; to prevent double counting, we explicitly zero them out on each iteration.
· Backpropagate the prediction loss by calling loss.backward(). PyTorch stores the gradient of each parameter associated with the loss.
·Once we have the gradients, we call optimizer.step() to adjust the parameters through the gradients collected during backpropagation.
Training loop optimization code:
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad() #用于清空优化器中的梯度
loss.backward() #计算损失函数对参数的梯度,自动求导
optimizer.step() #根据梯度更新网络参数的值
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
Use the gradient descent algorithm to update the model parameters. In this algorithm, you need to calculate the gradient of the loss function with respect to the model parameters. This calculation process is the backpropagation algorithm.
The function of loss.backward() is to derive the loss function and obtain the gradient of each model parameter with respect to the loss function. This gradient can represent the contribution size and direction of the model parameters to the loss function in the current state, that is, the direction and size of the parameter update.
The updated parameters will be used for the next forward pass calculation and backpropagation calculation.
7. Save & Load Model Save and load the model
PyTorch models store learned parameters in an internal state dictionary called state_dict . These parameters can be saved through the torch.save method