[Pytorch programming] Pytorch entry learning related basic concepts and first experience

0. [Pytorch programming] Pytorch entry learning related basic concepts and first experience

My blog column Pytorch programming series. For Python environment configuration, refer to "[Python Learning] Windows 10 Start Your Anaconda Installation and Python Environment Management" or "[Python Learning] Pure Terminal Commands to Start Your Anaconda Installation and Python Environment Management" .

Author: Chen Yirong
Code environment: Python3.6, Pytorch1.4.0, jupyter notebook

reference resources

Pytorch official website: https://pytorch.org/
PyTorch official tutorial Chinese version: https://www.pytorch123.com/

Environment configuration

Before configuring the Pytorch environment, you need to determine your own development conditions:

System: Linux, Mac, Windows
Package management tools: Conda, Pip, …
Locale: Python, C++, Java
Computing resources: whether there is a graphics card, the model of the graphics card, the version of the graphics card, and the version of CUDA

The above conditions vary according to the actual situation of each individual.

Taking myself as an example, I use a server, Ubuntu 18.04, the package management tool is Conda, the graphics card version is GeForce RTX 2080ti, the nvidia driver version is 495.46, and the supported cuda version is up to 11.5.

Generally speaking, the system is fixed, and the graphics card model in the computing resources is fixed. We can determine the supported nvidia driver according to the graphics card model, and the supported cuda version according to the nvidia driver version. You can find it on the website https: // www.nvidia.cn/geforce/drivers/Search for the nvidia driver that suits you, refer to [Python Learning] Install CUDA and cuDNN from scratch on Ubuntu 18.04 for driver configuration.

For Python environment configuration, refer to "[Python Learning] Windows 10 Start Your Anaconda Installation and Python Environment Management" or "[Python Learning] Pure Terminal Commands to Start Your Anaconda Installation and Python Environment Management" .

Once the driver environment configuration is complete, you can use the package management tool to install Pytorch. The installation command looks like this:

Install pytorch1.4.0 under Ubuntu18.04, cuda10.1 configuration

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

Install pytorch1.6.0 under Ubuntu18.04, cuda10.2 configuration

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

These commands can be found on the pytorch official website. For details, please refer to the link: https://pytorch.org/get-started/previous-versions/

View the Pytorch version used in the current conda environment

import torch
print(torch.__version__)  #注意是双下划线

1.4.0

Simple experience with Pytorch

Import related packages

torch is the top-level package, and the commonly used packages are:

torch.nn: Provides a series of basic neural network modules, see https://pytorch.org/docs/1.4.0/nn.html for details
torch.utils.data: Provides the classes required for data reading, see https://pytorch.org/docs/1.4.0/data.html for details
torchvision: Provides visual processing of popular datasets, models, etc., see https://pytorch.org/docs/1.4.0/torchvision/index.html for details
torchtext: Provides popular data sets, models, etc. for natural language processing. For details, see https://pytorch.org/text/stable/index.html
, etc.

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

Download the FashionMNIST dataset

This data set refers to https://pytorch.org/docs/1.4.0/torchvision/datasets.html#fashion-mnist
Run the following command, a directory named data will be created in the directory where this file is located, and then in the data directory Create the FashionMNIST directory, so the directory to store the dataset is:

./data/FashionMNIST

# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data", # 指定下载的数据集存储的根目录
    train=True, # 下载训练集
    download=True, # If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data", # 指定下载的数据集存储的根目录
    train=False,# 下载测试集
    download=True,
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz



HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz



HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz




HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz



HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…


Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw
Processing...
Done!

View dataset dimensions

torch.utils.data.DataLoader is a data loading class, as follows:

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
       batch_sampler=None, num_workers=0, collate_fn=None,
       pin_memory=False, drop_last=False, timeout=0,
       worker_init_fn=None)

Among them, dataset specifies the object of torch.utils.data.Dataset class or its subclasses, and batch_size specifies the data read batch.

Before neural network training, data loading objects need to be created. As shown below, two data loading objects are initialized: train_dataloader and test_dataloader.

batch_size = 64

# 创建训练集和测试集的数据加载对象
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64

Define a neural network

Defining a neural network is not complicated. Its essence is to create a class that needs to inherit the parent class torch.nn.Module. Therefore, torch.nn.Moduleit is also called the base class of all neural network modules. Its definition format is as follows:

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        # 初始化的模型模块

    def forward(self, x):
        # 模型的前向传递
        return logits

# 利用torch.cuda.is_available()判断GPU是否可用，从而确定device选项
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# 创建神经网络类，该类需要继承父类nn.Module
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten() # 把一个数据拉成一维，相当于torch.nn.Flatten(start_dim=1, end_dim=-1)
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512), # 线性变换模块，输入为[batch_size,28*28]，输出为[batch_size,512]
            nn.ReLU(), # 激活函数模块
            nn.Linear(512, 512), # 线性变换模块，输入为[batch_size,512]，输出为[batch_size,512]
            nn.ReLU(), # 激活函数模块
            nn.Linear(512, 10) # 线性变换模块，输入为[batch_size,512]，输出为[batch_size,10]
        )

    def forward(self, x):
        x = self.flatten(x) # 从x的第二维开始拉成一维，[64, 1, 28, 28]--->[64, 1*28*28]
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cuda device
NeuralNetwork(
  (flatten): Flatten()
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

Calculate the total model parameters and trainable parameters

def count_trainable_parameters(model):
    '''获取需要训练的参数数量
    使用示例：print(f'The model has {count_trainable_parameters(model):,} trainable parameters')
    '''
    return sum(p.numel() for p in model.parameters() if p.requires_grad) 

def count_total_parameters(model):
    '''获取模型总的参数数量
    使用示例：print(f'The model has {count_total_parameters(model):,} total parameters')
    '''
    return sum(p.numel() for p in model.parameters()) 

total_params = count_total_parameters(model)
print(f'{total_params:,} total parameters.')
total_trainable_params = count_trainable_parameters(model)
print(f'{total_trainable_params:,} total trainable parameters.')

669,706 total parameters.
669,706 total trainable parameters.

Define loss function and optimizer

loss_fn = nn.CrossEntropyLoss()  # nn.LogSoftmax() 与 nn.NLLLoss() 的组合
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) # 使用torch.optim.SGD优化器，固定学习率为0.001

Define training process and testing process

Training requires training data loaders, models, loss functions, and optimizers. The process can be summarized as:

Data loading and reading
call model calculation
Use the loss function to calculate the loss value
Initialize the gradient to zero, then use the loss for a backward pass, updating all parameters

The testing process is similar to the training process, but does not include the last two steps.

# 训练过程
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
# 测试过程       
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

Perform model training

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.312360  [    0/60000]
loss: 2.291506  [ 6400/60000]
loss: 2.271516  [12800/60000]
loss: 2.252658  [19200/60000]
loss: 2.244356  [25600/60000]
loss: 2.222664  [32000/60000]
loss: 2.228115  [38400/60000]
loss: 2.203855  [44800/60000]
loss: 2.192679  [51200/60000]
loss: 2.153655  [57600/60000]
Test Error: 
 Accuracy: 38.2%, Avg loss: 2.149532 

Epoch 2
-------------------------------
loss: 2.169748  [    0/60000]
loss: 2.151293  [ 6400/60000]
loss: 2.094591  [12800/60000]
loss: 2.099757  [19200/60000]
loss: 2.060622  [25600/60000]
loss: 2.005261  [32000/60000]
loss: 2.027278  [38400/60000]
loss: 1.956725  [44800/60000]
loss: 1.939770  [51200/60000]
loss: 1.872965  [57600/60000]
Test Error: 
 Accuracy: 57.6%, Avg loss: 1.870182 

Epoch 3
-------------------------------
loss: 1.906271  [    0/60000]
loss: 1.869101  [ 6400/60000]
loss: 1.755461  [12800/60000]
loss: 1.784045  [19200/60000]
loss: 1.698693  [25600/60000]
loss: 1.647050  [32000/60000]
loss: 1.658152  [38400/60000]
loss: 1.569280  [44800/60000]
loss: 1.572960  [51200/60000]
loss: 1.469435  [57600/60000]
Test Error: 
 Accuracy: 63.3%, Avg loss: 1.492781 

Epoch 4
-------------------------------
loss: 1.561221  [    0/60000]
loss: 1.521496  [ 6400/60000]
loss: 1.376429  [12800/60000]
loss: 1.439268  [19200/60000]
loss: 1.347573  [25600/60000]
loss: 1.329434  [32000/60000]
loss: 1.343370  [38400/60000]
loss: 1.273368  [44800/60000]
loss: 1.297012  [51200/60000]
loss: 1.199990  [57600/60000]
Test Error: 
 Accuracy: 64.4%, Avg loss: 1.230259 

Epoch 5
-------------------------------
loss: 1.306296  [    0/60000]
loss: 1.285495  [ 6400/60000]
loss: 1.123326  [12800/60000]
loss: 1.221825  [19200/60000]
loss: 1.122804  [25600/60000]
loss: 1.131946  [32000/60000]
loss: 1.158367  [38400/60000]
loss: 1.095908  [44800/60000]
loss: 1.125929  [51200/60000]
loss: 1.049075  [57600/60000]
Test Error: 
 Accuracy: 65.0%, Avg loss: 1.070740 

Done!

save model

import os

SAVE_PATH = "./about_pytorch_model"
if not os.path.exists(SAVE_PATH):
    os.makedirs(SAVE_PATH)

torch.save(model.state_dict(), os.path.join(SAVE_PATH,"model.pth"))
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth

load model

model = NeuralNetwork()
model.load_state_dict(torch.load(os.path.join(SAVE_PATH,"model.pth")))

<All keys matched successfully>

test model

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"