Pytorch English official document study notes (2, torch.utils.data.Dataset, torch.utils.data.DataLoader, torchvision)

PyTorch provides two data primitives: torch.utils.data.DataLoader and torch.utils.data.Dataset , allowing you to use preloaded datasets as well as your own data.

一、torch.utils.data.Dataset

A custom data set class (inheriting the Dataset class) must implement three functions.
__init__, __len__, and__getitem__。

__len__(): Returns the size of the data set. The data set we constructed is an object. Unlike sequence types (lists, tuples, strings), the data set can directly use len() to obtain the length of the sequence. The purpose of __len__() is to facilitate direct acquisition like a sequence. The length of the object.
__getitem__(): Implements indexing a certain data in the data set. We know that a sequence can obtain any element in the sequence through the index method, and __getitem__() implements the ability to obtain any element in the object through the index method. In addition, we can implement data preprocessing in __getitem__().

This is implemented as the following example: FashionMNIST images are stored in a directory img_dir, and their labels are stored in a CSV file annotations_file.

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

How to use the read_image function, the return value is tensor , so the image in the above code does not need to be converted using the transforms.ToTensor class.

torchvision.io.read_image(path: str, mode: ImageReadMode = ImageReadMode.UNCHANGED) → Tensor
path (str) – path of the JPEG or PNG image.

Below is an example of how to load the Fashion-MNIST dataset from TorchVision. Fashion-MNIST is a Zalando article image dataset consisting of 60,000 training instances and 10,000 test instances. Each example consists of a 28×28 grayscale image and an associated label from 10 categories.
·rootis the path where training/test data is stored.
·trainUsed to specify which part of the data needs to be loaded after the data set is downloaded. If set to True, it means that the training set part of the data set is loaded; if it is set to False, it means that the test set of the
data set is loaded. set part.
·downloadTrue If there is no data in the root directory, download it from the Internet.
·transformUsed to specify what kind of transformation operation needs to be performed on the data when importing the data set.
·target_transformConversion of specified tags.

import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

training_data = datasets.FashionMNIST(
    root="./data",#指存放在项目根目录下的 data 文件夹
    train=True,
    download=True,
    transform=ToTensor()
)
#如果觉得Pycharm中下载速度太慢，可以按住ctrl后点击FashionMNIST查看源码找到其中的url属性，
#复制进浏览器直接下载，下载完成转移到该项目下正确位置

二、torch.utils.data.DataLoader

Function: DataLoader encapsulates Dataset objects or objects of custom data classes into an iterator; this iterator can iteratively output the contents of the Dataset.

torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=None, sampler=None, batch_sampler=None, 
num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, 
multiprocessing_context=None, generator=None, *, prefetch_factor=2,
 persistent_workers=False, pin_memory_device='')

Several important parameters of __init__ in the DataLoader class.
·dataset: This is the output of pytorch’s existing data reading interface (such as torchvision.datasets.ImageFolder) or a custom data interface. The output is either an object of the torch.utils.data.Dataset class or inherited from torch.utils Object of custom class of .data.Dataset class.
·batch_size：how many samples per batch to load (default: 1).
· shuffle:set to True to have the data reshuffled at every epoch (default: False). Whether to disrupt the order of data in the next round when looping training data. Generally set to True
·drop_last (bool, optional)– set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False ). Set to True to discard the last data if the division cannot be completed, and False to not discard but the last set of data is smaller
·num_workers (int, optional)– how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)The number of processes used to load the package
·sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with len implemented. If specified, shuffle must not be specified.

· batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
train_features, train_labels = next(iter(train_dataloader))
#iter(dataloader)返回的是一个迭代器，然后可以使用next()访问
print(f"Feature batch shape: {
      
      train_features.size()}")
print(f"Labels batch shape: {
      
      train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {
      
      label}")

Dataloader is essentially an iterable object. Using iter(dataloader) returns an iterator, which can then be accessed using next; you can also use
to for inputs, labels in dataloadersaccess the iterable object;

3. torchvision

The PyTorch team has specially developed a visual tool package torchvision. This package is independent of PyTorch
torchvision and mainly contains three parts:

① ·datasets: Provides commonly used data set loading, which is designed to inherit torch.utils.data.Dataset, mainly including MNIST, CIFAR10/100, ImageNet, COCO, etc.;

介绍一下常用的torchvision.datasets.ImageFolder
torchvision.datasets.ImageFolder(root: str,
transform: Optional[Callable] = None,
target_transform: Optional[Callable] = None,
loader: Callable[[str], Any] = ,
is_valid_file: Optional[Callable[[str], bool]] = None)
Parameters
·root (string)– Root directory path.
There must be a folder in the root directory, and there are pictures in the folder.
·transform (callable, optional) – A function/transform that takes in an PIL image and ·returns a transformed version. E.g, transforms.RandomCrop
·target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
·loader (callable, optional) – A function to load an image given its path.
·is_valid_file – A function that takes path of an Image file and check if the file is a valid file (used to check of corrupt files)

dataset = torchvision.datasets.ImageFolder(dataset_path, transform=transforms)
dataloader = DataLoader(dataset, batch_size=args.batch_size, shuffle=True)

② ·transforms: Provides commonly used data preprocessing operations, mainly including operations on Tensor and PIL Image objects;
③·models: Provides network structures and pre-trained models of various classic networks in deep learning, including AlexNet, VGG series, ResNet series, Inception series, etc.;

1. See the example of using datasets.
All the encapsulated datasets in datasets in torchvision are subclasses of torch.utils.data.Dataset. They all implement the __getitem__ and __len__ methods. They can all use torch.utils .data.DataLoader performs data loading.

2. There are a large number of data conversion classes in Transforms, which mainly include operations on Tensor and PIL (Python Image Library) Image objects, a large part of which can be used to implement data enhancement (DataArgumentation). If the image data that can participate in model training is very limited for the problem we need to solve, then it is necessary to generate a new training set by performing various transformations on the limited image data. These transformations can be reduction or reduction. Enlarging the size of the image, flipping the image horizontally or vertically, etc. are all methods of data enhancement.

It should be noted that conversion is generally divided into two steps.
The first step is to construct the conversion operation, such as transf = transforms.Normalize(mean=x, std=y).
The second step is to perform the conversion operation, such as output = transf(input). .

Also availabletransforms.ComposeChain a series of transforms operations.

torchvision.transforms.Compose([ ts1,ts2,ts3... ])
#ts为transforms操作,ts1的输出与ts2的输入类型要保持一致

Let's take a look at the commonly used data transformation operations in torchvision.transforms.
This part is easy to understand by looking directly at the source code and accompanying explanations. If you encounter comments that do not clearly indicate the input and output variable types, you can use

print()
print(type())

torchvision.transforms.Resize: Change the image size to the specified size

'''
size (sequence or int): Desired output size. If size is a sequence like
            (h, w), output size will be matched to this. If size is an int,
            smaller edge of the image will be matched to this number.
            i.e, if height > width, then image will be rescaled to
            (size * height / width, size)
'''
tran_resize=transforms.Resize((512,512))
resize_img=tran_resize(tensor_img)

torchvision.transforms.ToTensor: Convert an image in PIL or numpy format with shape (H, W, C) into a tensor with shape (C, H, W). At the same time, each value in the [0,255] range is divided by 255 and normalized to the [0,1] range.

Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
    if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
    or if the numpy.ndarray has dtype = np.uint8

torchvision.transforms.ToPILImage: Used to convert Tensor variable data into PIL image data, mainly to facilitate the display of image content.

transforms.Normalize(mean, std, inplace=False): Execute the normalization formula on the image channel by channel: output = (input - mean) / std.
We need to calculate the mean and std of each channel ourselves. If the data set is large, the mean and std should be calculated by sampling.
mean: the mean of each channel
std: the standard deviation of each channel
inplace: whether to modify the input itself

Why does this speed up the convergence of the model?
Others answered: If the data is distributed between (0,1), the actual bias may be that the input b of the neural network will be relatively large, and b = 0 when the model is initialized. This will cause the neural network to converge slowly. After Normalize Finally, the convergence speed of the model can be accelerated.

img_path="数据集/train/ants_image/0013035.jpg"#对应输入自己项目下的图片路径
img=Image.open(img_path)
tensor_img=transforms.ToTensor().__call__(img)
norm=transforms.Normalize([0.5,0.5,0.5],[0.5,0.5,0.5])
tensor_img=norm(tensor_img)

torchvision.transforms.RandomCrop

torchvision also provides two commonly used functions.
one ismake_grid, it can stitch together multiple pictures;
Detailed explanation of the make_grid function on the pytorch official website

def make_grid(
    tensor: Union[torch.Tensor, List[torch.Tensor]],
    nrow: int = 8,#默认8张一行
    padding: int = 2,
    normalize: bool = False,
    value_range: Optional[Tuple[int, int]] = None,
    scale_each: bool = False,
    pad_value: float = 0.0,
    **kwargs,
) -> torch.Tensor:
Args:
        tensor (Tensor or list): 4D mini-batch Tensor of shape (B x C x H x W)
            or a list of images all of the same size.
        nrow (int, optional): Number of images displayed in each row of the grid.
            The final grid size is ``(B / nrow, nrow)``. Default: ``8``.
        padding (int, optional): amount of padding. Default: ``2``.
        normalize (bool, optional): If True, shift the image to the range (0, 1),
            by the min and max values specified by ``value_range``. Default: ``False``.
        value_range (tuple, optional): tuple (min, max) where min and max are numbers,
            then these numbers are used to normalize the image. By default, min and max
            are computed from the tensor.
        range (tuple. optional):
            .. warning::
                This parameter was deprecated in ``0.12`` and will be removed in ``0.14``. Please use ``value_range``
                instead.
        scale_each (bool, optional): If ``True``, scale each image in the batch of
            images separately rather than the (min, max) over all images. Default: ``False``.
        pad_value (float, optional): Value for the padded pixels. Default: ``0``.

    Returns:
        grid (Tensor): the tensor containing grid of images.

the other one
issave_img, it can save Tensor as a picture.

def save_image(
    tensor: Union[torch.Tensor, List[torch.Tensor]],
    fp: Union[str, pathlib.Path, BinaryIO],
    format: Optional[str] = None,
    **kwargs,
) -> None:
    """
    Save a given Tensor into an image file.

    Args:
        tensor (Tensor or list): Image to be saved. If given a mini-batch tensor,
            saves the tensor as a grid of images by calling ``make_grid``.
        fp (string or file object): A filename or a file object
        format(Optional):  If omitted, the format to use is determined from the filename extension.
            If a file object was used instead of a filename, this parameter should always be used.
        **kwargs: Other arguments are documented in ``make_grid``.
    """

example:

training_data = datasets.FashionMNIST(
    root="./data",#指存放在项目根目录下的 data 文件夹
    train=True,
    download=True,
    transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=16, shuffle=True)
from torchvision.utils import make_grid, save_image
dataiter = iter(train_dataloader)
img = make_grid(next(dataiter)[0], 4) # 以每行4个图片拼接，且少去了描述图片数量的第一个channel从而变成3 channel
save_image(img, 'a.png')

make_grid example in pytorch

3. Model
① Model import and modification
https://pytorch.org/vision/stable/models.html
After opening the web page, you can see the introduction on the right. There are many existing models provided here. The classification model VGG is introduced below.
Insert image description here

torchvision.models.vgg16(*, weights: Optional[torchvision.models.vgg.VGG16_Weights] = None, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG
——VGG-16 from Very Deep Convolutional Networks for Large-Scale Image Recognition.
Parameters：
weights (VGG16_Weights, optional) – The pretrained weights to use. See VGG16_Weights below for more details, and possible values. By default, no pre-trained weights are used.
weights指模型预先训练好的权重(程度)是多少

progress (bool, optional) – If True, displays a progress bar of the download to stderr. Default is True.
为true的话显示一个下载进度条

**kwargs – parameters passed to the torchvision.models.vgg.VGG base class. Please refer to the source code for more details about this class.

vgg_16=torchvision.models.vgg16()
print(vgg_16)
vgg_16.classifier.add_module('add_Linear',nn.Linear(1000,10))
#指定在vgg_16中的classifier层加一层Linear(1000,10)

vgg_16.add_module('add_Linear',nn.Linear(1000,10))
print(vgg_16)

②Save and read model

#保存方式1,模型结构+模型参数
torch.save(vgg_16,"vgg.pth")

#保存方式2,模型参数(官方推荐)
torch.save(vgg_16.state_dict(),"vgg.pth")

#读取
torch.load("vgg.pth")