Pytorch model training-----------The whole process of ImageFolder of data set loading

ImageFolder for data set loading

ImageFolderA general data loader, the data in the data set is organized in the following way

The function is as follows

ImageFolder(root, transform``=``None``, target_transform``=``None``, loader``=``default_loader)

Parameter explanation

  • Root specifies the path to load the image

  • transform: Transformation operation performed on PIL Image, the input of transform is the returned object of using loader to read the image

  • target_transform: transformation of label

  • loader: How to read the image after a given path, the default is to read the PIL Image object in RGB format

Label is stored in a dictionary after sorting in the order of the folder name, that is {class name: class number (starting from 0)}. Generally speaking, it is best to name the folder directly with a number starting from 0, which will be consistent with the ImageFolderactual label, If it is not for this naming convention, it is recommended to look at the self.class_to_idxattributes to understand the mapping relationship between label and folder name

img

It can be seen that it is divided into two categories: cat and dog

import torchvision.datasets as dset
dataset = dset.ImageFolder('./data/dogcat_2') #没有transform,先看看取得的原始图像数据
print(dataset.classes)  #根据分的文件夹的名字来确定的类别
print(dataset.class_to_idx) #按顺序为这些类别定义索引为0,1...
print(dataset.imgs) #返回从所有文件夹中得到的图片的路径以及其类别

return

['cat', 'dog']
{
    
    'cat': 0, 'dog': 1}
[('./data/dogcat_2/cat/cat.12484.jpg', 0), ('./data/dogcat_2/cat/cat.12485.jpg', 0), ('./data/dogcat_2/cat/cat.12486.jpg', 0), ('./data/dogcat_2/cat/cat.12487.jpg', 0), ('./data/dogcat_2/dog/dog.12496.jpg', 1), ('./data/dogcat_2/dog/dog.12497.jpg', 1), ('./data/dogcat_2/dog/dog.12498.jpg', 1), ('./data/dogcat_2/dog/dog.12499.jpg', 1)]

View the obtained picture data:

#从返回结果可见得到的数据仍是PIL Image对象
print(dataset[0])
print(dataset[0][0])
print(dataset[0][1]) #得到的是类别0,即cat

return:

(<PIL.Image.Image image mode=RGB size=497x500 at 0x11D99A9B0>, 0)
<PIL.Image.Image image mode=RGB size=497x500 at 0x11DD24278>
0

And then transformbecome a tensor

 data_transform = {
    
    
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor()
                                     ]),
        "val": transforms.Compose([transforms.Resize((224, 224)),  # cannot 224, must (224, 224)
                                   transforms.ToTensor()
                                   ])}

    train_dataset = datasets.ImageFolder(root=os.path.join(data_root, "train"),
                                         transform=data_transform["train"])
    
    print(train_dataset[0])
	print(train_dataset[0][0])
	print(train_dataset[0][1]) #得到的是类别0,即cat

get

(tensor([[[0.3765, 0.3765, 0.3765,  ..., 0.3765, 0.3765, 0.3765],
         [0.3765, 0.3765, 0.3765,  ..., 0.3765, 0.3765, 0.3765],
         [0.3804, 0.3804, 0.3804,  ..., 0.3765, 0.3765, 0.3765],
         ...,
         [0.4941, 0.4941, 0.4902,  ..., 0.3804, 0.3765, 0.3765],
         [0.5098, 0.5098, 0.5059,  ..., 0.3804, 0.3765, 0.3765],
         [0.5098, 0.5098, 0.5059,  ..., 0.3804, 0.3765, 0.3765]],

        [[0.5686, 0.5686, 0.5686,  ..., 0.5843, 0.5804, 0.5804],
         [0.5686, 0.5686, 0.5686,  ..., 0.5843, 0.5804, 0.5804],
         [0.5725, 0.5725, 0.5725,  ..., 0.5843, 0.5804, 0.5804],
         ...,
         [0.5686, 0.5686, 0.5686,  ..., 0.5961, 0.5922, 0.5922],
         [0.5686, 0.5686, 0.5686,  ..., 0.5961, 0.5922, 0.5922],
         [0.5686, 0.5686, 0.5686,  ..., 0.5961, 0.5922, 0.5922]],

        [[0.4824, 0.4824, 0.4824,  ..., 0.4902, 0.4902, 0.4902],
         [0.4824, 0.4824, 0.4824,  ..., 0.4902, 0.4902, 0.4902],
         [0.4863, 0.4863, 0.4863,  ..., 0.4902, 0.4902, 0.4902],
         ...,
         [0.3647, 0.3686, 0.3804,  ..., 0.4824, 0.4784, 0.4784],
         [0.3451, 0.3490, 0.3608,  ..., 0.4824, 0.4784, 0.4784],
         [0.3451, 0.3490, 0.3608,  ..., 0.4824, 0.4784, 0.4784]]]), 0)
         
         省略
         图像Tensor [........]
         标签Tensor [0]

You can know train_dataset[0]that one tupleis a tensor and the other is a scalar

    print(train_dataset[0][0].shape)
    print(train_dataset[0][1])

return

torch.Size([3, 224, 224])
0

DataLoader generates batch training data

for epoch in range(epoch_num):
        for image, label in train_loader:
			print("label: ", labels, labels.dtype)
            print("imges: ", images, images.dtype)

return

label:  tensor([0]) torch.int64
    
imges:  tensor([[[[0.2275, 0.2275, 0.2235,  ..., 0.2196, 0.2196, 0.2196],
          [0.2275, 0.2275, 0.2235,  ..., 0.2196, 0.2196, 0.2196],
          [0.2235, 0.2235, 0.2235,  ..., 0.2157, 0.2157, 0.2157],
          ...,
          [0.2392, 0.2392, 0.2392,  ..., 0.2235, 0.2235, 0.2235],
          [0.2353, 0.2353, 0.2353,  ..., 0.2235, 0.2275, 0.2275],
          [0.2353, 0.2353, 0.2353,  ..., 0.2235, 0.2275, 0.2275]],

         [[0.2392, 0.2392, 0.2353,  ..., 0.2275, 0.2275, 0.2275],
          [0.2392, 0.2392, 0.2353,  ..., 0.2275, 0.2275, 0.2275],
          [0.2353, 0.2353, 0.2353,  ..., 0.2235, 0.2235, 0.2235],
          ...,
          [0.2275, 0.2275, 0.2314,  ..., 0.2196, 0.2196, 0.2196],
          [0.2235, 0.2235, 0.2275,  ..., 0.2196, 0.2196, 0.2196],
          [0.2235, 0.2235, 0.2275,  ..., 0.2196, 0.2196, 0.2196]],

         [[0.1961, 0.1961, 0.1961,  ..., 0.1765, 0.1765, 0.1765],
          [0.1961, 0.1961, 0.1961,  ..., 0.1765, 0.1765, 0.1765],
          [0.1922, 0.1922, 0.1922,  ..., 0.1725, 0.1725, 0.1725],
          ...,
          [0.1529, 0.1529, 0.1529,  ..., 0.1686, 0.1686, 0.1686],
          [0.1490, 0.1490, 0.1490,  ..., 0.1686, 0.1686, 0.1686],
          [0.1490, 0.1490, 0.1490,  ..., 0.1686, 0.1686, 0.1686]]]]) torch.float32

enumerate loading

The enumerate() function is used to convert a traversable data object (such as a list,TupleOr string) is combined into an index sequence, which lists data and data subscripts at the same time, which is generally used in a for loop.

You can use this loop to load batchsizeeach of the following 元组to getCorresponding pictures and labels

Multi-element assignment

When Python detects that there are multiple variables on the left of the equal sign and list or tuple on the right, it will automatically decompress the list and tuple and assign them to the corresponding elements in turn

l = [1, 2]
a, b = l

So you can

for step, data in enumerate(train_bar):
            images, labels = data

Finally, nn.CrossEntropyLoss() calculates the loss regression

nn.CrossEntropyLoss()It is nn.logSoftmax()and nn.NLLLoss()integration, it can be directly used to replace both the network operation. Let's take a look at the calculation process.

img

Where x is the output vector of the network and class is the real label

For example, the output of a three-classification network for an input sample is [-0.7715, -0.6205,-0.2562], the true label of the sample is 0, and the nn.CrossEntropyLoss()calculated loss is

img

The reference link speaks very well

Reference two

Guess you like

Origin blog.csdn.net/ahelloyou/article/details/115185996