The most basic operation object of pytorch is Tensor (tensor), which represents a multi-dimensional matrix, similar to numpy ndarrays, which can be used on the GPU to accelerate calculations, and numpy type conversion is also very convenient
torch.nn provides us with mature layers and a series of activation functions, which can create complex models very quickly like building blocks
The process of building a model in pytorch: define a class (this class should inherit nn.Module), and implement two methods in the class: 1. Initialization function: realize the network layer structure that needs to be implemented in the process of building the network. 2. Define the process of forward propagation in the forward function.
When creating a layer, the weights and biases are initialized randomly
The calculation formula of the matrix size after convolution is: N=(W-F+2P)/S+1
Channel order of torch Tensor: [batch,channel,height,width]
Review CNN: 1. The number of layers of the convolution kernel is the same as the number of input channels, and the number of convolution kernels is the same as the number of output channels (convolution process: each layer of convolution kernel is respectively convolved with the feature map of the corresponding layer Product, add on the same channel). 2. The purpose of the pooling layer: reduce the parameters to increase the calculation speed and improve the robustness of the feature map (learned from experiments)
After maximum pooling downsampling, the height and width become half of the original (the pooling layer will only change the height and width, but will not change the depth of the feature map)
When testing the model, you can customize a random input x as the input image, such as:
The super function is a function that inherits the parent class (understood as a function of the parent class is executed first, and then the following statement is executed)
The function that defines the convolutional layer is nn.Conv2d, and the order of parameters is: depth, number of convolution kernels, size of convolution kernel, and the default step is 1
The function that defines the downsampling layer is nn.MaxPool2d, and the parameter order is: convolution kernel size, step distance, padding
Define the function of the fully connected layer as nn.Linear, the input of the fully connected layer is a one-dimensional vector, so the obtained feature matrix needs to be flattened into a one-dimensional vector
The output of the last fully connected layer needs to be changed according to its own classification category
x = F.relu(self.conv1(x)) means that the input passes through the first convolutional layer and then passes through the relu activation function
The role of the view function is reshape, the parameter of the view is the changed shape, -1 means that the first dimension (batch) is automatically inferred, and the second parameter is flattening
The model.py file code of the simplest model LeNet network is as follows:
classLeNet(nn.Module):def__init__(self):super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(3,16,5)
self.pool1 = nn.MaxPool2d(2,2)
self.conv2 = nn.Conv2d(16,32,5)
self.pool2 = nn.MaxPool2d(2,2)
self.fc1 = nn.Linear(32*5*5,120)
self.fc2 = nn.Linear(120,84)
self.fc3 = nn.Linear(84,10)defforward(self, x):#x代表输入的数据,就是一个tensor
x = F.relu(self.conv1(x))# input(3, 32, 32) output(16, 28, 28) #将输入经过第一个卷积层再经过relu激活函数
x = self.pool1(x)# output(16, 14, 14)
x = F.relu(self.conv2(x))# output(32, 10, 10)
x = self.pool2(x)# output(32, 5, 5)
x = x.view(-1,32*5*5)# output(32*5*5) view函数起到的作用是reshape,view的参数的是改变后的shape,-1代表第一个维度(batch)是自动推理的,第二个参数就是展平
x = F.relu(self.fc1(x))# output(120)
x = F.relu(self.fc2(x))# output(84)
x = self.fc3(x)# output(10) 没有接softmax的原因:理论上确实需要,但实际训练网络计算交叉熵时在其内部已经实现了一个更加高效的sofymax方法,所以不需要添加了(内置了)return x
Torchvision is a graphics library of pytorch, which serves the PyTorch deep learning framework and is mainly used to build computer vision models. torchvision.transforms is mainly used for some common graphics transformations. The following is the composition of torchvision:
> 1.torchvision.datasets: Some functions for loading data and commonly used data set interfaces;
> 2.torchvision.models: Contains commonly used model structures (including pre-trained models), such as AlexNet, VGG, ResNet, etc.;
> 3.torchvision.transforms: commonly used image transformations, such as cropping, rotation, etc.;
> 4.torchvision.utils: Some other useful methods.
> One of the torchvision.transforms.Compose() class. The main function of this class is to concatenate multiple image transformation operations. The parameters in Compose are actually a list, and the elements in this list are the transform operations you want to perform
> transforms.Normalize: normalize the data by channel, that is, first subtract the mean, and then divide by the standard deviation
> transforms.ToTensor: Convert PIL Image or ndarray to tensor, and normalize to [0-1]. Note: Normalization to [0-1] is directly divided by 255. If your ndarray data scale changes, you need to modify it yourself.
> transforms.Resize: zoom
> transforms.RandomResizedCrop: randomly crop the given image to different sizes and aspect ratios, and then scale the cropped image to the specified size
> transforms.Grayscale: Convert to grayscale
> transforms.RandomHorizontalFlip: randomly rotate the image of the given PIL horizontally with the given probability, the default is 0.5;
> About: optimizer.zero_grad() ----> clear the historical loss gradient every time a batch is calculated, optimizer.zero_grad() is called once. If the historical gradient is not cleared, the calculated historical gradient will be accumulated. Normally, The batchsize is set according to the conditions of the hardware device. Generally, the larger the value, the better the training effect. However, it is usually impossible for hardware devices to train with a large batch due to insufficient memory, etc., you can use optimizer.zero_grad() to realize a large batch for training in disguise, that is, by calculating multiple small batch loss gradients , Which is equivalent to the loss gradient of a large batch for back propagation
General training classification network script, with notes:
General classification network prediction script, with notes:
transform = transforms.Compose([transforms.Resize((32,32)),
transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
net = LeNet()
net.load_state_dict(torch.load('Lenet.pth'))#载入保存的权重文件
im = Image.open('1.jpg')
im = transform(im)# [C, H, W]
im = torch.unsqueeze(im, dim=0)# [N, C, H, W] 转化成Tensor的格式,加一个维度with torch.no_grad():#意思是不需要求损失梯度
outputs = net(im)
predict = torch.max(outputs, dim=1)[1].data.numpy()#找到输入图像对应的类别索引print(classes[int(predict)])
The nn.Sequential function can pack a series of layer structures and combine them into a new layer structure. It is commonly used in the model initialization definition layer structure. Compared with the self.conv1 = nn.Conv2d(3, 16, 5) statement Very efficient and convenient.
In the nn.ReLU(inplace=True) statement, the inplace parameter can be understood as a method for pytorch to reduce memory usage by increasing the amount of calculation. Through this method, a larger model can be loaded into the memory.
The self.modules() function returns all the layer structures in the network, and the isinstance() function can determine the type of data
Both nn.init.kaiming_normal_ and nn.init.normal_ are initialization variable functions, the latter is a normal distribution method to assign values to parameters,In fact, these statements are not needed. They are written just for everyone to learn. In torch, the convolution and fully connected layers are automatically initialized with the Cumming method.
Through the Dropout method, the nodes of the fully connected layer can be inactivated in a certain proportion to prevent over-fitting. The Dropout is generally placed between the fully connected layer and the fully connected layer. The p in nn.Dropout (p=0.5) represents the proportion of randomly inactivated neurons, and the default is 0.5.
nn.Linear(2048, 2048) defines the function of the fully connected layer as the number of input neurons and the number of output neurons.
The torch.flatten method can start flattening variables from a custom dimension
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") statement: torch.device can specify the device used in the training process. The syntax in the brackets means if the current device is available Use the GPU, then use the first GPU, if not, use the CPU.
Explain os.path.abspath(os.path.join(os.getcwd(), "... /...")): os.getcwd() is to get the directory where the current file is located, .../... is to return to the upper directory, os.path.join is to join two paths
torchvision.datasets.ImageFolder is the function of pytorch to load datasets. By default, your datasets have been sorted into folders.Photos of the same category under the same folder. The second parameter is data preprocessing. Commonly used statements are as follows:
cla_dict = dict((val, key) for key, val in flower_list.items()): This statement can reverse the above dictionary to facilitate subsequent operations
json_str = json.dumps(cla_dict, indent=4): This statement can encode the cla_dict dictionary into json format
torch.utils.data.DataLoader can just load the newly loaded data set and the set batchsize and shuffleRandomly obtain batches of data from the sample, The parameter num_workers (the number of threads used) can only be set to 0 under window, and can be set to a non-zero value (number of threads opened) under linux to speed up data generation and improve training speed. Note: shuffle=True means to shuffle all data first, and then take batch.
net.train() and net.eval() will manage the dropout and BN layers. Calling net.train() during training will enable it, and calling net.eval() during verification will disable dropout and BN.
In some Python engineering projects, we will see that there will be a colon in the function parameters, and some functions will be followed by an arrow, as shown below: def make_features(cfg: list):, it is worth noting that,The type suggestion symbol is not mandatory and check, that is to say, even if the actual parameters passed in and the suggested parameters do not match, no error will be reported. It is just for programmers to understand the input and output of the function during joint development.
Variable parameters are passed in. Adding a * in front of the list or tuple can turn the elements of the list or tuple into a variable parameter and pass it into the function, like this: nn.Sequential(*layers), of course, the definition of the function must be as follows Form: def function name (*a):.
Variable parameters allow you to pass in 0 or any number of parameters. These variable parameters are automatically assembled into a tuple when the function is called, and keyword parameters allow you to pass in 0 or any number of parameters with parameter names. These keywords The parameters are automatically assembled into a dict inside the function. Similar to variable parameters, you can also assemble a dict first, or you can call a simplified way of writing: **extra. **extra means that all the key-values of the dict of extra are passed into the parameters of the function with keyword parameters, and the function parameters will get a dict, as shown below:
net = vgg(model_name=model_name, num_classes=5, init_weights=True)defvgg(model_name="vgg16",**kwargs):
model = VGG(make_features(cfg),**kwargs)return model
torch.cat is to join two tensors (tensor) together according to the specified dimension
nn.AdaptiveAvgPool2d is an adaptive average pooling function. The parameters are the height and width of the output feature matrix. Regardless of the height and width of the input feature matrix, the height and width can be specified.