pytorch accelerated loading program

pytorch not like mxnet of RecordIO file, each read a large number of very small figure is difficult, hard not to force, then took the basic blocking the loaded data, and tried lmdb, fast is fast men, and then not support the training process random shuffle, eventually give up.

Cover its bitter world for some time now, long deities and ideas, here is a simple summary, then pick up, are summarized as follows:

1. DALI - the most reliable solutions, even augmented data also provides

Reference links pip install installation, according to the manual write a few lines of code, and comes with pytorch dataloader use essentially the same, to support data augmentation , have a common need to search for function names optimistic about the specific use, need to bring with dali the Uniform and CoinFlip do random, if that comes with python random generate random numbers, words and if _rand == 1 can only be generated once, that is, if the generated random 1 that augmented this training for all data do, 0 will generate not be augmented, this place is adjusted for one night, finally a good look described in the manual and git issue amended as follows, randomness feel it.

class reader_pipeline(Pipeline):
    def __init__(self, image_dir, batch_size, num_threads, device_id):
        super(reader_pipeline, self).__init__(batch_size, num_threads, device_id)
        self.input = dali_ops.FileReader(file_root = image_dir, random_shuffle = False)
        self.decode = dali_ops.ImageDecoder(device = 'mixed', output_type = dali_types.RGB)
       
        self.cmn_img = dali_ops.CropMirrorNormalize(device = "gpu",
                                           crop=(112, 112),  crop_pos_x=0, crop_pos_y=0,
                                           output_dtype = dali_types.FLOAT, image_type=dali_types.RGB,
                                           mean=[0.5*255, 0.5*255, 0.5*255],
                                           std=[0.5*255, 0.5*255, 0.5*255]
                                           )
       
        self.brightness_change = dali_ops.Uniform(range=(0.6,1.4))
        self.rd_bright = dali_ops.Brightness(device="gpu")
        self.contrast_change = dali_ops.Uniform(range=(0.6,1.4))
        self.rd_contrast Dali_ops.Contrast = (Device = " GPU " ) 
        self.saturation_change = dali_ops.Uniform (Range = (0.6,1.4 )) 
        self.rd_saturation = dali_ops.Saturation (Device = " GPU " ) 
        self.jitter_change = dali_ops.Uniform (Range = (1,2 )) 
        self.rd_jitter = dali_ops.Jitter (Device = " GPU " ) 
        self.disturb = dali_ops.CoinFlip (probability = 0.3 )        # 0.3 to generate a probability, the probability of generation 0.3 0 
        self.hue_change = dali_ops.Uniform (Range = (-30,30 ))     # random number between -30,30
        self.hue = dali_ops.Hue(device = "gpu")
       
    def define_graph(self):
        jpegs, labels = self.input(name="Reader")
        images = self.decode(jpegs)
        brightness = self.brightness_change()
        images = self.rd_bright(images, brightness=brightness)
        contrast = self.contrast_change()
        images = self.rd_contrast(images, contrast = contrast)
        saturation = self.saturation_change()
        images = self.rd_saturation(images, saturation = saturation)
        jitter = self.jitter_change()
        disturb = self.disturb()
        images = self.rd_jitter(images, mask = disturb)
        hue = self.hue_change()
        images = self.hue(images, hue = hue)
       
        imgs = self.cmn_img(images)
        return (imgs, labels)

Error Solution:

1.1 AttributeError:. Module 'nvidia.dali.ops' has no attribute 'ImageDecoder' version is not installed on

import torch
torch.version.cuda

Selecting a corresponding installation according to the print version information

1.2. RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'dali::CUDAError'
what(): CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered

Very strange, batchsize = 120 not occurred, the large change 240 this error message, remove the data enhancement, num_workers 2 changed from 1, the error is gone. Plus data enhancement, num_workers how change will not work, should be dali's bug , seemingly still not resolved.

1.3 Press here described, dali samples to support the classification list as input, but when they used or error, not sure whether list format.

2. jpeg4py decoding, the ubuntu would be able to speed up 30%, centos installation fails

def pil_loader(path):
    with open(path, 'rb') as f:
        img = Image.open(f)
        return img.convert('RGB')
def jpeg4py_loader(path):
    with open(path, 'rb') as f:
        img = jpeg.JPEG(f).decode()
        return Image.fromarray(img) 

def __getitem__(self, index):
        path, target = self.samples[index]
        img = pil_loader(path)
        # img = jpeg4py_loader(path)
        if self.transform is not None:
            img = self.transform(img)
        return img, int(target)

Referring to git installation, it is relatively small code changes, if ubuntu machine is more cost effective solution.

3. Memory When the hard disk , not obvious, but the operation is relatively simple, large memory space, then use it

sudo mount -t tmpfs -o size=100g tmpfs /data03/xxx/tmp_data

This operation requires the separation of a first area, again within the tmp_data copy of the data, not the data to a file folder execute this command, otherwise the data on the whole lost ......

4. data prefetching , not obvious, the author does not try to say play with blood effect .....

5. handwritten multi-threaded load data fall into the trap for several days, and finally give up ......

Written before the one-time multi-threaded load all the training data to memory , when small (20,000) training data, a memory can all load shore, then launched into the training speed, gpu completely wasted. At that time also spent a day and think about online loading is not difficult to change, do not want to trapped for a long time, the phenomenon is more thread returns torch tensor will crash, may be their ability to project too frustrated it, not to engage in.

Some store does not support storing a large number of scattered documents, and perhaps eventually be looking for a similar RecordIO / TFRecord way ......