Data Processing Toolbox Overview
Pytorch involves data processing (data loading, data preprocessing, data enhancement, etc.). The main toolkits and their interrelationships are shown in the
figure.
- Dataset: is an abstract class, other data sets must inherit this class, and rewrite two of the methods ( getitem , len ).
- DataLoader: Define a new iterator to implement batch read, shuffle data, and provide parallel acceleration and other functions.
- random_split: randomly split the data set into a new non-overlapping data set of a given length
- Sampler: multiple sampling functions
The torchvision in the middle of the figure is the Pytorch visualization processing tool, including four categories:
- datasets: Provides commonly used dataset loading, inherited from torch.utils.data.Dataset in design
- models: Provide various classic network structures and trained models in deep learning (if pretrained = True is selected)
- transforms: Common data preprocessing operations, mainly including operations on Tensor and PIL Image objects
- utils: contains two functions, one is make_grid, which can stitch multiple pictures in a grid; the other is save_img, which can save tensor as a picture