Solving pytorch dataloader error: Trying to resize storage that is not resizable

Save money

If you encounter this kind of problem, especially if it usually runs well and reports an error when changing the data set, there is a high probability that there is a problem with the data set itself. Just follow this idea to debug.

Problem Description

The dataloader reports the following error when setting num_workers to any number greater than 0:

Traceback (most recent call last):
  File "/home/username/distort/main.py", line 131, in <module>
    model, perms, accs = train_model(dinfos, args.mid, args.pretrained, args.num_classes, args.treps, args.testep, args.test_dist, device, args.distort)
  File "/home/username/distort/main.py", line 65, in train_model
    for img, y in train_dataloader:
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in default_collate
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in <listcomp>
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 140, in default_collate
    out = elem.new(storage).resize_(len(batch), *list(elem.size()))
RuntimeError: Trying to resize storage that is not resizable

When num_workers is set to 0, a new error will appear:

Traceback (most recent call last):
  File "/home/username/distort/main.py", line 130, in <module>
    model, perms, accs = train_model(dinfos, args.mid, args.pretrained, args.num_classes, args.treps, args.testep, args.test_dist, device, args.distort)
  File "/home/username/distort/main.py", line 64, in train_model
    for img, y in train_dataloader:
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in default_collate
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in <listcomp>
    return [default_collate(samples) for samples in transposed]  # Backwards compatibility.
  File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 141, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 64, 64] at entry 0 and [1, 64, 64] at entry 32

Troubleshooting

The second error is relatively easy to troubleshoot. Add code to the __getitem__() function of the custom dataset class: when the shape[0] of the read tensor is 1, print the path of the original data file corresponding to the tensor.

I found that there is indeed an image with a channel number of 1 in the data set (I used tiny-imagenet-200), but I didn’t expect that it was really the fault of the data set.

problem solved

Use expand of the tensor class in the __getitem__() function. For tensors with incorrect number of channels, just call it expand(3,-1,-1). Afterwards, the data set can be loaded normally when num_workers is set to 0 or other positive numbers.

Also note that some blogs say that num_workers needs to match the number of GPU cores. This logic is outrageous. As can be seen from the first error above, the error point has nothing to do with the CUDA library, so it cannot be a GPU-related problem. At least according to the commonly used method of loading data sets, num_workers specifies the maximum number of CPU threads used by the dataloader.

Guess you like

Origin blog.csdn.net/weixin_45346743/article/details/128271203