Common functions of pytorch in deep learning and how to build an image classification model

  • The most basic operation object of pytorch is Tensor (tensor), which represents a multi-dimensional matrix, similar to numpy ndarrays, which can be used on the GPU to accelerate calculations, and numpy type conversion is also very convenient

  • torch.nn provides us with mature layers and a series of activation functions, which can create complex models very quickly like building blocks

  • The process of building a model in pytorch: define a class (this class should inherit nn.Module), and implement two methods in the class: 1. Initialization function: realize the network layer structure that needs to be implemented in the process of building the network. 2. Define the process of forward propagation in the forward function.

  • When creating a layer, the weights and biases are initialized randomly

  • The calculation formula of the matrix size after convolution is: N=(W-F+2P)/S+1

  • Channel order of torch Tensor: [batch,channel,height,width]

  • Review CNN: 1. The number of layers of the convolution kernel is the same as the number of input channels, and the number of convolution kernels is the same as the number of output channels (convolution process: each layer of convolution kernel is respectively convolved with the feature map of the corresponding layer Product, add on the same channel). 2. The purpose of the pooling layer: reduce the parameters to increase the calculation speed and improve the robustness of the feature map (learned from experiments)

  • After maximum pooling downsampling, the height and width become half of the original (the pooling layer will only change the height and width, but will not change the depth of the feature map)

  • When testing the model, you can customize a random input x as the input image, such as:

> input = torch.rand([32,3,32,32]) //batch=32 channel=3````

> model = LeNet() //Instantiate the model

> output = model(input)

  • The super function is a function that inherits the parent class (understood as a function of the parent class is executed first, and then the following statement is executed)

  • The function that defines the convolutional layer is nn.Conv2d, and the order of parameters is: depth, number of convolution kernels, size of convolution kernel, and the default step is 1

  • The function that defines the downsampling layer is nn.MaxPool2d, and the parameter order is: convolution kernel size, step distance, padding

  • Define the function of the fully connected layer as nn.Linear, the input of the fully connected layer is a one-dimensional vector, so the obtained feature matrix needs to be flattened into a one-dimensional vector

  • The output of the last fully connected layer needs to be changed according to its own classification category

  • x = F.relu(self.conv1(x)) means that the input passes through the first convolutional layer and then passes through the relu activation function

  • The role of the view function is reshape, the parameter of the view is the changed shape, -1 means that the first dimension (batch) is automatically inferred, and the second parameter is flattening

  • The model.py file code of the simplest model LeNet network is as follows:

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):#x代表输入的数据,就是一个tensor
        x = F.relu(self.conv1(x))    # input(3, 32, 32) output(16, 28, 28)  #将输入经过第一个卷积层再经过relu激活函数
        x = self.pool1(x)            # output(16, 14, 14)
        x = F.relu(self.conv2(x))    # output(32, 10, 10)
        x = self.pool2(x)            # output(32, 5, 5)
        x = x.view(-1, 32*5*5)       # output(32*5*5) view函数起到的作用是reshape,view的参数的是改变后的shape,-1代表第一个维度(batch)是自动推理的,第二个参数就是展平
        x = F.relu(self.fc1(x))      # output(120)
        x = F.relu(self.fc2(x))      # output(84)
        x = self.fc3(x)              # output(10) 没有接softmax的原因:理论上确实需要,但实际训练网络计算交叉熵时在其内部已经实现了一个更加高效的sofymax方法,所以不需要添加了(内置了)
        return x
  • Torchvision is a graphics library of pytorch, which serves the PyTorch deep learning framework and is mainly used to build computer vision models. torchvision.transforms is mainly used for some common graphics transformations. The following is the composition of torchvision:

> 1.torchvision.datasets: Some functions for loading data and commonly used data set interfaces;

> 2.torchvision.models: Contains commonly used model structures (including pre-trained models), such as AlexNet, VGG, ResNet, etc.;

> 3.torchvision.transforms: commonly used image transformations, such as cropping, rotation, etc.;

> 4.torchvision.utils: Some other useful methods.

> One of the torchvision.transforms.Compose() class. The main function of this class is to concatenate multiple image transformation operations. The parameters in Compose are actually a list, and the elements in this list are the transform operations you want to perform

> transforms.Normalize: normalize the data by channel, that is, first subtract the mean, and then divide by the standard deviation

> transforms.ToTensor: Convert PIL Image or ndarray to tensor, and normalize to [0-1]. Note: Normalization to [0-1] is directly divided by 255. If your ndarray data scale changes, you need to modify it yourself.

> transforms.Resize: zoom

> transforms.RandomResizedCrop: randomly crop the given image to different sizes and aspect ratios, and then scale the cropped image to the specified size

> transforms.Grayscale: Convert to grayscale

> transforms.RandomHorizontalFlip: randomly rotate the image of the given PIL horizontally with the given probability, the default is 0.5;

> About: optimizer.zero_grad() ----> clear the historical loss gradient every time a batch is calculated, optimizer.zero_grad() is called once. If the historical gradient is not cleared, the calculated historical gradient will be accumulated. Normally, The batchsize is set according to the conditions of the hardware device. Generally, the larger the value, the better the training effect. However, it is usually impossible for hardware devices to train with a large batch due to insufficient memory, etc., you can use optimizer.zero_grad() to realize a large batch for training in disguise, that is, by calculating multiple small batch loss gradients , Which is equivalent to the loss gradient of a large batch for back propagation

  • General training classification network script, with notes:

net = LeNet()
net.to(device)
loss_function = nn.CrossEntropyLoss()#定义损失函数,其内部已经内置了softmax
optimizer = optim.Adam(net.parameters(), lr=0.001)#定义优化器,参数一:所需要训练的参数(这里把网络中所有可训练的参数都进行训练);参数二:学习率
for epoch in range(5):
    running_loss = 0.0#用来累加在训练过程中的损失
    for step, data in enumerate(train_loader):#遍历训练集样本 enumerate() 函数返回数据和数据下标
        inputs, labels = data#将数据分离成输入的图像和其所对应的标签
        optimizer.zero_grad()
        outputs = net(inputs)#正向传播得到输出
        loss = loss_function(outputs, labels)#用定义的函数计算损失,参数为:网络预测的值,输入图片对应的真是标签
        loss.backward()#将loss进行反向传播
        optimizer.step()#进行参数的更新
        running_loss += loss.item()#每计算一个loss就将其追加到变量中
        if step % 500 == 499:#每隔五百步打印一次数据
            with torch.no_grad():#在接下来的计算过程中,都不会去计算每个节点的损失梯度,如果不用的话即使在测试集中也会计算损失梯度,会消耗更多的内存资源
                outputs = net(val_image)#验证集正向传播
                predict_y = torch.max(outputs, dim=1)[1]#寻找输出最大可能的标签类别 dim=1是在第一个维度寻找,因为第零个维度为batch
                #[1]代表只需要知道索引即可,不需要知道索引对应的最大值是多少
                #predict_y == val_label为将预测的标签类别和真实的标签类别进行比较,相同时返回true,不相同返回flase
                #sum()函数可以判断本次预测过程对了多少个样本
                #item()可以让tensor变量变成数值
                #最后再除以测试样本的数目,就得到了测试的准确率
                accuracy = (predict_y == val_label).sum().item() / val_label.size(0)
                print('[%d, %5d] train_loss: %.3f  test_accuracy: %.3f' %#打印一下训练过程中的信息:训练迭代轮数, 每一轮的多少步,五百步平均的训练损失(误差),测试样本的准确率
                      (epoch + 1, step + 1, running_loss / 500, accuracy))
                running_loss = 0.0
print('Finished Training')
save_path = './·····.pth'
torch.save(net.state_dict(), save_path)#将网络的所有参数进行保存
  • General classification network prediction script, with notes:

transform = transforms.Compose(
    [transforms.Resize((32, 32)),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
net = LeNet()
net.load_state_dict(torch.load('Lenet.pth'))#载入保存的权重文件
im = Image.open('1.jpg')
im = transform(im)  # [C, H, W]                                                                                      
im = torch.unsqueeze(im, dim=0)  # [N, C, H, W] 转化成Tensor的格式,加一个维度
with torch.no_grad():#意思是不需要求损失梯度
    outputs = net(im)
    predict = torch.max(outputs, dim=1)[1].data.numpy()#找到输入图像对应的类别索引
print(classes[int(predict)])
  • The nn.Sequential function can pack a series of layer structures and combine them into a new layer structure. It is commonly used in the model initialization definition layer structure. Compared with the self.conv1 = nn.Conv2d(3, 16, 5) statement Very efficient and convenient.

  • In the nn.ReLU(inplace=True) statement, the inplace parameter can be understood as a method for pytorch to reduce memory usage by increasing the amount of calculation. Through this method, a larger model can be loaded into the memory.

  • nn.Sequential structure example:

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),  # input[3, 224, 224]  output[48, 55, 55]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[48, 27, 27]
            nn.Conv2d(48, 128, kernel_size=5, padding=2),           # output[128, 27, 27]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 13, 13]
            nn.Conv2d(128, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, padding=1),          # output[128, 13, 13]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 6, 6]
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(128 * 6 * 6, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):#下面有介绍
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')#下面有介绍
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)
  • The self.modules() function returns all the layer structures in the network, and the isinstance() function can determine the type of data

  • Both nn.init.kaiming_normal_ and nn.init.normal_ are initialization variable functions, the latter is a normal distribution method to assign values ​​to parameters,In fact, these statements are not needed. They are written just for everyone to learn. In torch, the convolution and fully connected layers are automatically initialized with the Cumming method.

  • Through the Dropout method, the nodes of the fully connected layer can be inactivated in a certain proportion to prevent over-fitting. The Dropout is generally placed between the fully connected layer and the fully connected layer. The p in nn.Dropout (p=0.5) represents the proportion of randomly inactivated neurons, and the default is 0.5.

  • nn.Linear(2048, 2048) defines the function of the fully connected layer as the number of input neurons and the number of output neurons.

  • The torch.flatten method can start flattening variables from a custom dimension

  • device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") statement: torch.device can specify the device used in the training process. The syntax in the brackets means if the current device is available Use the GPU, then use the first GPU, if not, use the CPU.

  • Explain os.path.abspath(os.path.join(os.getcwd(), "... /...")): os.getcwd() is to get the directory where the current file is located, .../... is to return to the upper directory, os.path.join is to join two paths

  • torchvision.datasets.ImageFolder is the function of pytorch to load datasets. By default, your datasets have been sorted into folders.Photos of the same category under the same folder. The second parameter is data preprocessing. Commonly used statements are as follows:

data_transform = {
    
    
    "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
    "val": transforms.Compose([transforms.Resize((224, 224)), 
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

data_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))  
image_path = data_root + "/data_set/flower_data/"  
train_dataset = datasets.ImageFolder(root=image_path + "/train",
                                     transform=data_transform["train"])
train_num = len(train_dataset)#打印训练集有多少张图片
flower_list = train_dataset.class_to_idx#返回一个字典,及分类的名称和其对应的索引
  • cla_dict = dict((val, key) for key, val in flower_list.items()): This statement can reverse the above dictionary to facilitate subsequent operations

  • json_str = json.dumps(cla_dict, indent=4): This statement can encode the cla_dict dictionary into json format

  • torch.utils.data.DataLoader can just load the newly loaded data set and the set batchsize and shuffleRandomly obtain batches of data from the sample, The parameter num_workers (the number of threads used) can only be set to 0 under window, and can be set to a non-zero value (number of threads opened) under linux to speed up data generation and improve training speed. Note: shuffle=True means to shuffle all data first, and then take batch.

  • net.train() and net.eval() will manage the dropout and BN layers. Calling net.train() during training will enable it, and calling net.eval() during verification will disable dropout and BN.

  • In some Python engineering projects, we will see that there will be a colon in the function parameters, and some functions will be followed by an arrow, as shown below: def make_features(cfg: list):, it is worth noting that,The type suggestion symbol is not mandatory and check, that is to say, even if the actual parameters passed in and the suggested parameters do not match, no error will be reported. It is just for programmers to understand the input and output of the function during joint development.

  • Variable parameters are passed in. Adding a * in front of the list or tuple can turn the elements of the list or tuple into a variable parameter and pass it into the function, like this: nn.Sequential(*layers), of course, the definition of the function must be as follows Form: def function name (*a):.

  • Variable parameters allow you to pass in 0 or any number of parameters. These variable parameters are automatically assembled into a tuple when the function is called, and keyword parameters allow you to pass in 0 or any number of parameters with parameter names. These keywords The parameters are automatically assembled into a dict inside the function. Similar to variable parameters, you can also assemble a dict first, or you can call a simplified way of writing: **extra. **extra means that all the key-values ​​of the dict of extra are passed into the parameters of the function with keyword parameters, and the function parameters will get a dict, as shown below:

net = vgg(model_name=model_name, num_classes=5, init_weights=True)
def vgg(model_name="vgg16", **kwargs):
	model = VGG(make_features(cfg), **kwargs)
	return model
  • torch.cat is to join two tensors (tensor) together according to the specified dimension

  • nn.AdaptiveAvgPool2d is an adaptive average pooling function. The parameters are the height and width of the output feature matrix. Regardless of the height and width of the input feature matrix, the height and width can be specified.

Guess you like

Origin blog.csdn.net/qq_42308217/article/details/113650317