pytorch basics (2)

pytorch basics (two)

Activation function

The activation function of pytorch is in the nn module!

import torch
from torch import nn
tanh = nn.Tanh()

x = torch.tensor([1, 2, 3, 4], dtype=float)
tanh = nn.Tanh()
y = tanh(x)
print(y)
结果:
tensor([0.7616, 0.9640, 0.9951, 0.9993], dtype=torch.float64)

ReLu, Sigmoid, etc. are all in nn, and the corresponding functions can be obtained by directly calling them.

SoftMax

import torch
from torch import nn

softmax = nn.Softmax(dim=1)

For example: softmax(x), x is a piece of data with a shape=(64, 10). We perform softmax on the first dimension, which is the dimension where 10 is located, that is, perform softmax operations on 10 data as a group.

[[x0,x1,x2,x3,x4,x5,x6,x7,x8,x9],
 [x0,x1,x2,x3,x4,x5,x6,x7,x8,x9],
 ...x61
 [x0,x1,x2,x3,x4,x5,x6,x7,x8,x9]]

64 here is the batch size!

For another example, the data batch size is 1, that is, when there is only one sample, this is the case:

import torch
from torch import nn

x = torch.tensor([1, 2, 3, 4], dtype=float)
# 默认dim就是0,这里写不写都可以
sofrmax = nn.Softmax(dim=0)
y = softmax(x)

DataLoader

DataLoader's dataset can be list, tensor (as long as it is an iterable type), for traversing DataLoader can get every element!

from torch.utils.data import DataLoader

x = torch.tensor([1, 2, 3, 4, 5, 6])
# 将y根据x编程独热向量,tolist是为了将y整个可以变成tensor类型(因为如果list中元素为tensor是变不成tensor类型的)
y = [torch.zeros((6,)).tolist() for i in x]
y = torch.tensor(y)
# 将x,y放到一个dataset中,以便转换成DataLoader后可以方便的取出数据
dataset = []
for i in range(len(x)):
    # 转成独热向量
    y[i][x[i]-1] = 1.0
    # 将x,y对用元素变成元组,放到dataset中[(x0,y0),(x1,y1)...]
    dataset.append((x[i], y[i]))
print(y)
print(dataset)
print()
    
batch_size = 2
train_loader = DataLoader(dataset=dataset,
                         batch_size=batch_size,
                         shuffle=True)
for x, y in train_loader:
    print("x:", x)
    print("y:", y)
结果:
tensor([[1., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 1.]])
[(tensor(1), tensor([1., 0., 0., 0., 0., 0.])), (tensor(2), tensor([0., 1., 0., 0., 0., 0.])), (tensor(3), tensor([0., 0., 1., 0., 0., 0.])), (tensor(4), tensor([0., 0., 0., 1., 0., 0.])), (tensor(5), tensor([0., 0., 0., 0., 1., 0.])), (tensor(6), tensor([0., 0., 0., 0., 0., 1.]))]

x: tensor([4, 2])
y: tensor([[0., 0., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0., 0.]])
x: tensor([5, 1])
y: tensor([[0., 0., 0., 0., 1., 0.],
        [1., 0., 0., 0., 0., 0.]])
x: tensor([3, 6])
y: tensor([[0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1.]])

One-hot encoding (using scatter interpolation)

In addition to writing the one-hot encoding above, you can also use the scatter method provided by tensor for one-hot encoding!

# 这是一个batch为6,共有6类的一个分类任务
label = torch.tensor([1, 2, 3, 4, 5, 6])
# 变成二维标签,[[1], [2], ..., [6]]
label = label.reshape(-1, 1)

# tensor.scatter(dim, index, src)
# dim: 对哪个维度进行插入
# index: 对哪个位置插值
# src: 插得值是啥
one_hot = torch.zeros(label.shape[0], 6).scatter(1, label-1, 1)
# zeros得到的是一个shape=(6, 6)的一个全0 tensor
# 对这个tensor进行插值,对1维、每行插得位置分别对应label-1中的每个元素,插得值为1
结果:
tensor([[1., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 1.]])

If you don’t understand, we use a sample to:

label = torch.tensor([1])

one_hot = torch.zeros(6).scatter(0, label-1, 1)
结果:
tensor([1., 0., 0., 0., 0., 0.])

Dropout

import torch
from torch import nn

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        # 隐藏层有500个节点,接收输入为784大小的数据
        self.hidden = nn.Linear(784, 500)
        # 将输入的数据50%变为0,也就变相的让向它提供数据的那一层丢弃50%的节点。
        self.drop = nn.Dropout(p=0.5)
        # 10个分类
        self.out = nn.Linear(500, 10)
        
    def forward(self, x):
        x = self.hidden(x)
        x = self.drop(x)
        y_hat = self.out(x)
        return y_hat

Weight decay

The weight decay is placed in the trainer (not in the loss function). Because it still uses the old loss function when backpropagating, it adds a regularization term reciprocal to the update method when the gradient descent updates the parameters (step), so that the weight can be decayed by directly bringing in the parameters!

import torch
from torch import nn, optim
optimizer = optim.SGD(model.parameters(), lr, weight_decay=0.1)
  • weight_decay is the parameter of the regularization term. (Determine the importance of the regularization term, which is a hyperparameter in the model)

Optimizer

The essence of all optimizers is gradient descent!

SGD

SGD is the basic algorithm.

  • Disadvantages: It cannot be lowered at the saddle point (in the worst case, it is basically impossible to encounter in reality)

Insert picture description here

Momentum

Make the gradient descent inertia.

  • Advantages: fast convergence
  • Disadvantages: If the number of training times is small, the direction may not be correct (because the convergence is fast, it is not easy to brake when looking for the direction), but if the number of training times is large, the direction will be adjusted back to the correct direction

NAG

It is an improved algorithm of Momentum.

  • Advantages: faster to find the right direction than Momentum
  • Disadvantages: If the number of training sessions is small, the direction is still not correct

Adam

Stochastic gradient descent maintains a single learning rate (that is, alpha) to update all weights, and the learning rate does not change during the training process. And Adam designs independent adaptive learning rate Adam algorithm for different parameters by calculating the first-order moment estimation and second-order moment estimation of the gradient(That is to say, the Adam optimization algorithm will optimize the lr learning rate during the training process)

  • Advantages: better and faster than SGD, more stable than other optimization algorithms (if you don't know which one to choose, use Adam)
import torch
from torch import nn, optim
optimizer = optim.Adam(model.parameters(), lr, weight_decay=0.1)

Using Adam optimization algorithm, lr can be appropriately adjusted down.

The method is more stable (if you don’t know which one to choose, use Adam)

import torch
from torch import nn, optim
optimizer = optim.Adam(model.parameters(), lr, weight_decay=0.1)

Using Adam optimization algorithm, lr can be appropriately adjusted down.

Save model and load model

Save the model

Use torch.save()to save

To save the model is to serialize the model to disk. Pytorch uses python's pickle program for serialization.

In pytorch, models, tensors, and dictionaries can all be serialized to disk!

Function prototype:torch.save(obj, f, pickle_model=<module '...'>, pickle_protocol=2)

parameter description
obj Keep the object
f The saved file name string (or file-like object)
pickle_module Module for picking metadata and objects
pickle_protocol Specify pickle protocal to override default parameters

Save format .pt\.pth\.pklmodel: . All three are fine, there is no difference in format, just the suffix is ​​different

Save the entire model

Save the 结构AND of the entire model 参数.

When loading the entire model, there is no need to refactor the model, and directly load a variable as a model.

# 用 pt pth pkl ckpt都可以
torch.save(model, "model.pt")

Only save the parameters of the model

Only save the parameters of the model. When loading, you need to instantiate the model (get the structure) and then load the parameters.

This requires a model class (while saving the entire model form does not require the model class)

# 用 pt pth pkl ckpt都可以
torch.save(model.state_dict(), "model_params.pt")

Hyperparameters in the saved model file

torch.save({
    
    
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss
}, "model.pt")

# 加载
model = MyModel()
optimizer = optim.SGD(*args, **kwargs)

checkpoint = torch.load("model.pt")

model.load_state_dict(checkpoint["model_state_dict"])
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
epoch = checkpoint["epoch"]
loss = checkpoint["loss"]

# 或model.train()
model.eval()

Save multiple models to file

torch.save({
    
    
    "modelA_state_dict": modelA.state_dict(),
    "modelB_state_dict": modelB.state_dict()
}, "models_params.pt")

# 加载
checkpoint = torch.load("models_params.pt")
modelA = MyModelA()
modelB = MyModelB()
modelA.load_state_dict(checkpoint["modelA_state_dict"])
modelB.load_state_dict(checkpoint["modelB_state_dict"])

modelA.train()
modelB.train()

Use fine-tuning to pre-train the model

# 保存预训练模型A
torch.save(model.state_dict(), "model_param.pt")

# 加载预训练模型A到模型B中
model.load_state_dict("model_param.pt", strict=False)

This part is very commonly used in transfer learning

When loading some model parameters for pre-training, it is likely to encounter key mismatch (model weights are saved and loaded back in the form of key-value pairs). Therefore, regardless of the lack of keys or excessive keys, you can ignore unmatched keys by setting the strict parameter to False in the load_state_dict() function.

If you want to load the parameters of one layer to other layers, but some keys do not match, then modify the key of the parameters in the state_dict to solve this problem.

Load the model

Loading the model in pytorch uses python's unpickling tool to deserialize the model on the disk into the memory.

Load the entire model

Function prototype:torch.load(f, map_location=None, puckle_model=<module 'pickle' from '...'>)

parameter description
f String to save the file name (or file-like object)
map_location A function or dictionary specifies how to map storage devices
puckle_module Module for unpickling metadata and objects (pickle_module when files must be serialized)
# 用 pt pth pkl 都可以
# 加载模型到cpu上
model = torch.load("model.pt")

# 加载模型到cpu上
model = torch.load("model.pt", map_location=torch.device('cpu'))

# 使用方法的形式,加载模型到cpu上
model = torch.load("model.pt", map_location=lambda storage, loc: storage)

# 加载模型到gpu1上
model = torch.load("model.pt", map_location=lambda storage, loc: storage.cuda(1))

# 将模型从gpu1映射到gpu0上
model = torch.load("model.pt", map_location={
    
    'cuda:1': 'cuda:0'})


# 注意一定要使用model.eval()来固定dropout和归一化层,否则每次推理会生成不同的结果
# 或model.train()
model.eval()

Load model parameters

Function prototype:torch.nn.Module.load_state_dict(state_dict, strict=True)

parameter description
state_dict Dictionary to save the model
strict Is the key in state_dict consistent with the key returned by model.state_dict()
model = MyModel()
# 模型将加载的权重复制到模型的权重中去
model.load_state_dict(torch.load("model_params.pt"))


# 注意一定要使用model.eval()来固定dropout和归一化层,否则每次推理会生成不同的结果
# 或model.train()
model.eval()

Note that you must use model.eval() to fix the dropout and normalization layer, otherwise each inference will generate different results! (Or use model.train())

Guess you like

Origin blog.csdn.net/qq_43477218/article/details/114179502