When training the neural network, preferably a data batch to operate, but also need to shuffle data in parallel and acceleration. In this regard, PyTorch provides DataLoader help us achieve these functions.
DataLoader function is defined as follows:
DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
num_workers=0, collate_fn=default_collate, pin_memory=False,
drop_last=False)
dataset: load the data set (Dataset objects)
batch_size: BATCH size
shuffle :: upset if the data
sampler: Samples Sampling, follow-up will detail
num_workers: the process of loading process using multiple numbers, 0 to not use multi-process
collate_fn: How a plurality of sample data spliced into a batch, generally use the default splicing can
pin_memory: whether the data stored in the pin memory area, the data pin memory of the GPU will go faster
drop_last: the number of data in the dataset may not be batch_size integral multiple, drop_last to True will be more out of a batch of insufficient data discard
def main(): import visdom import time viz = visdom.Visdom() db = Pokemon('pokeman', 224, 'train') x,y = next(iter(db)) ## print('sample:',x.shape,y.shape,y) viz.image(db.denormalize(x),win='sample_x',opts=dict(title='sample_x')) loader = DataLoader(db,batch_size=32,shuffle=True) for X, Y in Loader: # in order to obtain a form of a data set for each data set 32 viz.images (db.denormalize (X), nrow =. 8, win = ' BATCH ' , the opts = dict (title = ' BATCH ' )) viz.text (STR (y.numpy ()), win = ' label ' , the opts = dict (title = ' BATCH-Y ' )) the time.sleep ( 10)
pytorch data processing: define your own data set